Data centers

08:02 PM
Greg Ferro
Greg Ferro
Connect Directly
Repost This

VMware's SDN Dilemma: VXLAN or Nicira?

VMware has invested in two overlay network approaches: the VXLAN standard originally conceived by Cisco and STT, drafted by SDN startup Nicira. VMware acquired Nicira for more than a billion dollars. Which will VMware choose? Here’s my take.

VMware has a technology problem: It's backing two competing standards for overlay networks: Nicira's STT and the IETF draft standard VXLAN. An overlay network enables network virtualization, which is a core component of VMware's software-defined data center initiative. Both STT and VXLAN have upsides and downsides. I'll look at each protocol and speculate on which direction VMware may go.

First, a little background. Before being acquired by VMware, Nicira developed the Stateless Transport Tunneling (STT) protocol for tunneling between open source software switches in the Openvswitch project.

VXLAN, which is now an IETF draft standard, was originally proposed by Cisco. Cisco sources say that the company then got VMware involved (although the IETF draft has a lot of names on it). The end result is VMware is telling everyone that it has this great VXLAN overlay network technology that removes any hypervisor dependency from physical network devices. Even better, it's configured and managed from vCenter.

The question is, which protocol will win?

Nicira and STT

Prior to acquisition, Nicira had a software controller for managing tunnels between virtual switches, and used OpenFlow-like commands to configure the vSwitch. STT is a tunneling protocol that connects the virtual switches, thus forming a virtual network.

STT performs this task well enough. It uses the TCP protocol for encapsulation. Supposedly, operating systems can use the TCP offload function of modern network adapters for better performance.

However, STT also has several limitations. One problem is that the limited entropy in the STT header means it doesn't balance loads evenly over Ethernet port bundles in network backbones. Depending on your network design, this could be a significant limitation.

Second, STT currently works only with the Openvswitch software switch on Linux hypervisors such as Xen or KVM. That's not necessarily a problem for cloud providers and very large organizations; for instance, eBay is using Nicira in its OpenStack deployment. However, VMware is more common in enterprise data centers. It's possible VMware could add STT to the ESXi vSwitch, and thus deliver a multicloud network overlay strategy, but the VXLAN protocol already has a lot of momentum.

VXLAN's Multicast Issues

VXLAN depends heavily on a multicast-enabled underlay network to handle broadcast/unicast/multicast Ethernet protocols. (I use the term "underlay network" to describe the physical devices that pass Ethernet frames and IP packets.) What's not well understood is that IP multicast is complex and risky to operate.

Each VXLAN-enabled device is known as a VXLAN Tunnel End Point (VTEP). When the VTEP is configured with VXLANs, it will be configured to join an IP multicast group. Joining the multicast tree is the method for VTEPs to discover the MAC of each host in the VXLAN in a self-configuring and autonomous method. Direct server-to-server data flows are transported through the VXLAN overlay in unicast packets.

IP multicast also provides an efficient way to broadcast Ethernet frames to all servers as is required--for example, for unknown MAC address flooding and IP ARP Requests.

VMware recommends a separate multicast group for each VXLAN; thus, 50 VXLANs would require 50 separate multicast trees in an attempt to control L2 Ethernet flooding problems. L2 loops remain a problem in VXLAN networks, but the failure domain is reduced to an individual VXLAN itself. The problem is that each of those multicast trees requires state to be held in the network layer, which consumes CPU, memory and TCAM space. TCAM size is a serious limitation on network diameter, and overloaded TCAM is serious network threat.

A lesser performance problem is the frame replication silicon in the switches. At its core, multicast is a method for duplicating Ethernet frames inside the hardware of your network. One multicast frame must be sent out of every Ethernet port that needs to receive it. On a data center core switch, this could mean replicating one received frame to 300 ports (thus, 1 Gbps of inbound multicast packets results in 300 Gbps output). Network switches require dedicated silicon to handle the duplication process. For example, this is an approximation of silicon pathways inside a single M1-series line card from a Nexus 7000 showing the replication engines on the blade:

Internal Architecture of Single Line Card Nexus 7000. Source: Cisco Systems
(click image for larger view)
Internal Architecture of Single Line Card Nexus 7000. Source: Cisco Systems

There are a number of IP multicast routing protocols that maintain the multicast trees, including PIM-SM, PIM-DM, BiDir and ASM multicast. In general terms, PIM-SM will be the default choice because it's got the widest vendor support, but that isn't saying much. Most data center switches do not support multicast protocols today. This can make VXLAN hard to deploy in existing networks and usually requires new network hardware.

Next page: Picking a Winner

1 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
2/5/2013 | 7:35:02 PM
re: VMware's SDN Dilemma: VXLAN or Nicira?
I figure that VMware got their money's worth. They bought a network company without looking like they did. If VMware bought an actual network vendor then independence would be an issue.

Importantly, the Nicira people can create & influence standards & I think that VMware needed to find better networking for vSphere & vCloud. The networking strategy until now has been quite poor.

Expect VMware to stay in software, enabling APIs like they did for storage to abstract the networking into tools and avoid dependency on the someone's else platform.
User Rank: Apprentice
2/5/2013 | 7:27:57 PM
re: VMware's SDN Dilemma: VXLAN or Nicira?
Not really. The STT point may or may not be correct, but the other point is marketing.
User Rank: Apprentice
2/5/2013 | 7:26:22 PM
re: VMware's SDN Dilemma: VXLAN or Nicira?
It possible but I suspect there is enough hubris at Nicira to continue pushing STT as the solution. They were quite convinced of their cleverness when they announced it.

VMware is more pragmatic though. Will they want to antagonise networking vendors by insisting on STT ? Who wants to built hardware termination points for VXLAN & NVGRE and STT ?

Not me. VXLAN is good enough, lets go with that as a standard and move onto the next problem.
User Rank: Apprentice
2/4/2013 | 1:30:03 PM
re: VMware's SDN Dilemma: VXLAN or Nicira?
As usual, you have provided a very detailed and thoughtful analysis of a complex and confusing set of issues. Your contribution to keeping everybody real and informed about cloud networking and such is a great service to us all.

However, I think we may not yet have distilled this issue down to it's simplest form. I think it is helpful to think of it in two separate buckets, the control plane and the data plane. Both VXLan and STT (and NVGRE, and GRE and MPLS) can be used in the data plane to encapsulate packets and create a transparent L2-over-L3 tunnel overlay. Each of these uses a different format for the encapsulation "envelope". Any or all of them can be implemented with a roughly similar amount of work in any virtual switch. The subset of available L2-over-L3 tunneling protocols supported by merchant switching silicon in the pswitch is more limited.

All of the L2-over-L3 overlay encapsulations require a control plane to communicate the L2 to L3 mappings (what is the IP address of the tunnel egress switch for MAC address X?). Nicira's solution to this happens to be proprietary, communicating mac discovery in a floodless manner between the controller and the many disributed VSwitches. Alternative approaches such as that described in the 2009 VL2 paper have also been detailed publicly.These approaches hide the kind of hideous shortcomings you have documented so well in an IP multicast based solution. As proven by the recent announcement of an alternative (proprietary) control plane for VXLan baked into the the Cisco 1KV vswitch-based solution, the IP multicast hack was provided as an example of how one could build a control plane for VXLan, but was never mandated as the standard or required control plane for this. Since the day the VXLan draft was published, anybody with a clue has taken for granted that the IP multicast approach is a non-starter for virtually all prospective users.

The way I would anticipate this playing out in the market is that a number of proprietary schemes, based on SDN controllers, will be deployed as homogeneous, single vendor solutions. As this type of overlay proves valuable and easy to use, pressure to support interoperability between vendors will grow (although there will be enough open-source overlay implementations available to eliminate the need for interop to foster price competition). Once there is a solid control plane solution to propagate and synchronize L2 to L3 mappings in real-time, it is quite easy to support any/all encapsulation formats that have value. VXLan will a lead in PSwitch based termination (VTEP), since its already shipping in merchant switch ASICs. STT will be better adapted to current server side hardware (TCP offload features) and valuable to some folks until the next generation of (VXLan enabled?) server NICs start to ship. New encapsulations will come and go as the market evolves and the silicon evolves.

I see no reason why VMware/Nicira has to choose one and only one overlay format. I expect them to make an easy decision to use the overlay control plane that doesn't suck (NVP) over the multicast approach that does. Once this is integrated into their shipping product line, I would expect to see a flexible combination of encapsulation types offered based on market forces and use case tradeoffs. Furthermore, I expect every major vendor to have an L2-over-L3 overlay solution for network virtualization and for all of them to be based on a (initially proprietary) floodless control plane not based on IP multicast. Clearly there are other valuable assets that VMWare controls as a result of the Nicira buy (the team for example), but I expect that within a year or so overlay encapsulation implementing virtualization will become a commodity available in many flavors. Several months ago I had a conversation with an SDN development executive at a tier one network equipment company. I said "Is it just me, or does that IBM DOVE story look like a straight Nicira rip-off?". His reply "Of course it is. We're working on our own straight Nicira ripoff. Everybody will have one soon." . I hope that VMWare got a billion dollars worth of value beyond the overlay solution implementation, because the differentiation value of that technology is going to zero in the very near future.
User Rank: Apprentice
2/3/2013 | 6:30:16 PM
re: VMware's SDN Dilemma: VXLAN or Nicira?
Brad Hedlund from VMware disagrees with a couple of fundamental points made in this article. . Can you comment?

-- Umair Hoodbhoy
User Rank: Apprentice
2/1/2013 | 10:39:12 PM
re: VMware's SDN Dilemma: VXLAN or Nicira?
Insightful dissection of two leading candidates to determine the software-defined network, Greg. You may be right, a modified VXLAN will win out, but Nicira will be quick to pursue any opening for itself. Charlie Babcock, InformationWeek
More Blogs from Commentary
SDN: Waiting For The Trickle-Down Effect
Like server virtualization and 10 Gigabit Ethernet, SDN will eventually become a technology that small and midsized enterprises can use. But it's going to require some new packaging.
IT Certification Exam Success In 4 Steps
There are no shortcuts to obtaining passing scores, but focusing on key fundamentals of proper study and preparation will help you master the art of certification.
VMware's VSAN Benchmarks: Under The Hood
VMware touted flashy numbers in recently published performance benchmarks, but a closer examination of its VSAN testing shows why customers shouldn't expect the same results with their real-world applications.
Building an Information Security Policy Part 4: Addresses and Identifiers
Proper traffic identification through techniques such as IP addressing and VLANs are the foundation of a secure network.
SDN Strategies Part 4: Big Switch, Avaya, IBM,VMware
This series on SDN products concludes with a look at Big Switch's updated SDN strategy, VMware NSX, IBM's hybrid approach, and Avaya's focus on virtual network services.
Hot Topics
Converged Infrastructure: 3 Considerations
Bill Kleyman, National Director of Strategy & Innovation, MTM Technologies,  4/16/2014
Heartbleed's Network Effect
Kelly Jackson Higgins, Senior Editor, Dark Reading,  4/16/2014
White Papers
Register for Network Computing Newsletters
Current Issue
Twitter Feed