Inside VMware NSX

From VTEPs to OVSDB, VMware's NSX introduces unfamiliar protocols and technologies to the data center network. Check out this overview to get up to speed.

Brent Salisbury

September 10, 2013

7 Min Read

At VMworld 2013, VMware released its much-anticipated NSX network virtualization platform. A growing number of virtual ports have appeared in the data center due to server virtualization, and NSX takes advantage of these virtual switches to create an overlay that runs over the physical network.

NSX uses SDN concepts to implement network virtualization. It uses flow-based forwarding via OpenFlow to instantiate the network flows. Flow forwarding exposes various L2-L4 header fields, along with Layer 1 logical and physical interfaces. Flow forwarding is an important concept because that combination of header attributes describes application traffic, not just two addresses in a network.

By unifying business logic and network headers, network control applications can install a simple forwarding policy or complex forwarding and service insertions.

Nested inside of the ESX hypervisor is a virtual switch (vSwitch). The vSwitch can operate as a typical flood-and-learn switch, or attach to the NSX controller. NSX builds tunnels between vSwitches using the VXLAN protocol (among others). These tunnels originate and terminate in VXLAN Tunnel Endpoints (VTEPs).

[VMware lined up an impressive number of partners for its NSX launch--except one. Find out where Cisco and VMware disagree in "Cisco Skips VMware's NSX Coming Out Party."]

A VTEP can be instantiated in servers or in a hardware switch, if it is supported in the switch's firmware and silicon. This creates a data path connecting two data planes over an L2 or L3 network. One vSwitch gives the appearance of being directly connected to another. Each vSwitch is now a VTEP in that particular virtual topology.

Underlying physical ethernet fabric

Source: Brent Salisbury

The overlay approach provides two important benefits for developers.

First, overlays bypass the constraints presented in the native physical network such as VLAN IDs, discontiguous networks or disjointed administrative network domains.

Second, tunnels remove the need for developers to extract and validate all the network paths, health and performance of the underlying network. That is, developers would, without tunnels, have to program applications to go and find out network state of each autonomous network system, which is an impractical requirement.

Flow programming is not a trivial implementation, even if focusing only on the edge. With point-to-point tunnels between every vSwitch, all network elements appears directly connected to one another.

Instead of focusing on each node in the network, network applications can focus on business logic and the task of classifying application traffic--and in turn applying policy. It begins to create a modular environment.

VTEP Edge appears directly connected

Source: Brent Salisbury

The use of VXLAN enables a greater degree of segmentation in the data center network. Traditionally, tenants are segmented from one another using VLAN IDs (VID). However, the 12-bit data structure of the VLAN ID means that a data center network is limited to 4,094 VLANs. By contrast, VXLAN includes a 24-bit VNI (VXLAN Network Identifier), which providers over 16 million available IDs.

Currently most switching hardware is constrained by the VLAN field processor embedded in the silicon. Generally speaking, the current generation of switches can process 250 to 500 VLAN IDs in hardware without affecting performance.

Don't expect hardware vendors to support millions of VXLAN VNIs right way. Vendors are presumably using the same VLAN field processors and silicon resources to process VXLAN as they would VIDs.

NSX Forwarding

In the NSX architecture, a VM boots and the host registers with the NSX controller. The controller consults a table that identifies the tenant and returns the topology the host should participate in to the vSwitch. The key identifier for virtual isolation is the VNI, which maps to a tenant's VXLAN-segmented topology.

Layer 3 forwarding between broadcast domains is supported at the edge of the NSX network in the vSwitch. This is performed by ARPs being punted to the controller and looking up the location of the destination MAC and destination VTEP in a host table in the controller.

If the host is not found, the traffic can be dropped or forwarded to a BUM traffic service node. Host discovery is forwarded to the VTEPs in the tenant tunnel overlay with multicast. VMware says it will soon support unicast, as well. Unicast is always attractive, particularly for interconnects relying on provider networks or the Internet.

Next page: Integrating OverlaysMigration to an overlay is a challenge that few in the vendor, research or network community have been willing to go on record to articulate. The decoupled overlay needs to eventually integrate back into the physical network.

The NSX approach uses gateways that can either be delivered in software on x86 hardware or via physical switches from partners. The partner list includes Arista, Brocade, Cumulus, Dell, HP and Juniper Networks, with Arista demoing working code at VMworld.

The gateway (both hardware and software) registers with the NSX controller and appears as an element in the NSX infrastructure. Once the physical switch connects to the NSX controller, the controller instructs the switch to build tunnels into the tunnel fabric, thus establishing connectivity between the underlay and overlay networks.

NSX has a northbound API consisting of the NSX API and two open southbound protocols, OpenFlow and OVSDB. NSX uses the Open vSwitch database protocol (OVSDB) for data plane programming and management.

NSX also exchanges VTEP and associated VM MAC addresses on those tunnels using the JSON-based OVSDB protocol. The physical switches do not build OpenFlow control channels for forwarding to the controller like the hypervisors do on the soft edge.

Where Do We Go From Here?

The launch of NSX kicked off a lot of debate between overlays and underlays. What's clear is that physical and virtual networks are not mutually exclusive. Each one depends on the other. Creating overlay networks does not exclude the importance of a solid and scalable Ethernet fabric--the physical fabric is as important as ever.

That said, I think the overlay offers certain advantages.

• Time to market with new services and feature is measured in software development lifecycles rather then silicon foundry order and merchant silicon lifecycles.

• Capacity planning of network services begins to resemble the capex model found in storage and compute. Customers can buy the capacity they need and not the capacity they think they may need at some point in the hardware lifecycle.

• Packet processing and flow table constraints are significantly reduced on 32-core servers vs. current-generation ASICs.

Of course, there are also significant issues that need to be addressed in the overlay approach.

First, decoupling forwarding awareness of the overlay with a protocol (VXLAN) that sends frames unmodified into tunnels could introduce fragility if network state becomes inconsistent from the tunnel fabric.

Today, VXLAN overlays have no awareness of the plumbing underneath. An endpoint can either reach its target or it cannot. For example, if a physical circuit is discarding frames there is no mechanism today that instructs the overlay to avoid that problematic path. That said, dynamically self-healing networks don't exist today anyways. The question is whether the overlay should provision and extract attributes to the underlay, or vice versa.

Second, regardless of the abstraction method, many underlay networks are fragile. Routing scales and bridging does not. Overlays should be complimentary to stable and uncongested physical fabrics, not a fix for unstable, congested underlays.

There are undoubtedly still plenty of details and bugs to work out, but the overlay framework is here to stay. When it comes to actually implementing SDN in the data center, decoupling the logical from the physical constraints appears the path of least resistance, particularly if an organization needs to leverage existing hardware investments.

It's evident to me that it would take a software company to push networking out of its comfort zone. Now that that's happened, it raises other questions. I wonder who blinks first for API support between Cisco and VMware? If Cisco is willing to decouple any control (particularly OVSDB) to a VMware network controller, that will be an interesting indicator of its own SDN strategy.

I also wonder if Cisco will to dust off the Citrix acquisition playbook and shed EMC for good, or stick to the arms dealer strategy in server virtualization.

[SDN will have more than technology implications; it will also affect operations, which will affect the lives of network administrators. An Interop panel will address these concerns in the session "Will SDN Make Me Homeless?" Register today!]