VMware Execs Talk Virtualized Networks And Visibility

VMware's Martin Casado and Bruce Davie discuss the benefits of one aspect of software defined networking:overlaying a physical network with a virtual network.

Martin Casado

October 18, 2013

7 Min Read
NetworkComputing logo in a gray background | NetworkComputing

Editor's note: VMware's Martin Casado and Bruce Davie team up to comment on the benefits of one aspect of software-defined networking: overlaying a physical network with a virtual net. The authors didn't necessarily intend it that way, but this piece in large measure answers the questions raised by Padmasree Warrior, CTO of Cisco, in her Aug. 28 blog, Limitations of a Software-Only Approach to Data Center Networking.

With the recent launch of the VMware NSX network virtualization platform, there has been a surge of interest in network virtualization technologies. A common technique across many network virtualization solutions is the use of some sort of "overlay tunnel" approach, such as VMware's VXLAN, Network Virtualization using Generic Routing Encapsulation (NVGRE, backed by Microsoft), or the IETF standard body's Stateless Tunnel Transport (STT). Overlays provide a means to encapsulate traffic traversing virtual networks so that the physical network is only responsible for forwarding packets from edge to edge, using the outer header. (An earlier VMware post on overlay tunnels is here.)

One question that came up following the NSX launch was around the impact of overlay technologies on network visibility, so we'll address that question here.

In our experience, a well-designed network virtualization solution can actually solvevisibility issues by providing an unprecedented ability to monitor and troubleshoot virtual networks. We discuss below some of the monitoring and troubleshooting tools that can be (and have been) provided in an overlay-based network virtualization platform. These tools enable an operator to determine which problems are in the overlay versus the underlay, and to diagnose and rectify problems in either layer by viewing the underlay and the overlay in a unified manner.

[ Do we need a separation between physical and virtual elements on the network? See SDN Skirmish: Physical, Virtual Approaches Vie For Dominance . ]

As soon as you start virtualizing servers, you face the problem of how to troubleshoot connectivity and performance problems between VMs. This problem doesn't arise due to network virtualization -- it's there as soon as you start communicating between virtual machines. If there is a connectivity problem, you'll need to figure out if it's in the vSwitches, the physical network, or some intermediate device that might be intercepting traffic such as a firewall. What a network virtualization overlay provides is a mechanism to decompose the problem into constituent parts, which can then be diagnosed separately.

Consider the VMware NSX platform. It provides the ability to create, manage and monitor virtual networks from a central API. If you want to see what's up with the connectivity between a pair of VMs, for example, you (or a software tool) can issue an API request to query the status of the virtual network that is supposed to be connecting those VMs.

Global CIOGlobal CIOs: A Site Just For YouVisit InformationWeek's Global CIO -- our online community and information resource for CIOs operating in the global economy.

This request returns byte and packet counters on each of the virtual ports. Another API allows you to inject synthetic traffic as if it came from a given VM and see what happens to it. Did it get dropped due to an access control list(ACL) rule in the virtual network that was too restrictive? Did it fail to traverse the physical network linking the hypervisors? Either you pinpoint the problem in the virtual network, or you identify the physical path that's causing the problem.

Since we continuously check the health of overlay tunnels using health check packets between the hypervisors, we can often detect problems in the physical network before we have a problem with VM-VM connectivity. Because the virtualization controller knows which virtual networks depend on a given tunnel, it can identify which virtual networks are affected by a physical network fault.

Of course, the network virtualization layer doesn't know everything about the physical layer. So if, for example, we've tracked a problem down to a particular physical path between hypervisors, it's still going to require some traditional network monitoring tools to further localize the issue to a particular link or device in the network. But the visibility that we have in the virtual networking layer gives us the critical ability to isolate problems to either the virtual or the physical world. Furthermore, many tools that exist to manage physical networks can readily be extended to consume the detailed information that is available from the virtual network layer.

It should be clear that this is quite a change from the way we've traditionally operated networks. You can't just point a network protocol analyzer, such as wireshark, at a physical link and figure out the problem from there. That's not because the protocol analyzer can no longer see the traffic in the overlay -- it's quite straightforward to parse a VXLAN header and see the protocol headers of VM-VM traffic, and thus observe the inter-VM traffic on a given link. It's just that this can't be the only tool in the toolbox, rather one tool that is complemented by the large amount of information that can be gathered from the network virtualization layer.

Note that visibility isn't always just about troubleshooting, but also refers to the problem of mapping VM-VM traffic onto physical network resources. When you build overlay tunnels between hypervisors, traffic is going to follow whatever path the physical (or underlay) network thinks is best. That path will be based on the outer tunnel header and the routing algorithms of the physical network.

The good news is that, by moving a lot of network functions into the vSwitches, overlays make it appealing to use layer 3 routing in the physical network. Many modern data centers are now adopting "leaf-spine" or Clos network L3 designs with large amounts of equal-cost multipath (ECMP) across the core of the data center. These designs efficiently spread traffic across many parallel paths, which often does away with the need to carefully traffic engineer specific traffic flows to certain paths.

If it turns out that ECMP isn't possible in the physical network, then there are a few options. One is to use VM placement as a tool to reduce congestion. This has been done in the past, but network virtualization overlays allow complete freedom to place workloads anywhere in the data center. So if traffic between a pair of VMs is traversing a link and causing congestion, one option is to move a VM so that the traffic no longer traverses that link. And in fact, data gathered from the network virtualization layer can be used to guide VM placement decisions to reduce congestion -- that is, the greater visibility that we have in the virtual networking layer can be used to improve performance at the physical network layer.

Finally, we note that some underlays are appearing with explicit routing capabilities, e.g. to set up a high-speed, fiber-optic net dedicated to high-bandwidth uses, such as a network moving large amounts of real-time exchange information to trading desks. The network virtualization overlay is perfectly positioned to provide information about such flows, because it is already gathering flow-level information directly from vSwitches as described above. It's not a big leap to start passing such information to an underlay that can take advantage of such information.

To sum up, network virtualization shouldn't be seen as a big problem for network visibility. Instead, it's opening up new possibilities for improved visibility into the communication between VMs. With rich APIs for inspecting virtual networks and a combination of new and existing tools, the ability to understand what's going on in data center networks is only getting better.

About the Author

Martin Casado

Former CTO Networking, VMware

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights