• 08/13/2014
    8:00 AM
  • Rating: 
    0 votes
    Vote up!
    Vote down!

Inside White-Box Switches

Switches traditionally use ASICs for packet forwarding, but the new breed of white-box switches relies more heavily on generic CPUs to handle forwarding in order to reduce costs.

In my last article, I discussed the advantages of running white-box switches and operating systems from companies like Big Switch Networks and Cumulus Networks. One of the advantages of these OSs is that they reduce the amount of processes running in a monolithic kernel, which allows the CPU of the switch to use more power to forward packets.

As many readers pointed out, modern switching architecture uses application-specific integrated circuits (ASICs) for packet forwarding. They were entirely correct. However, there are a lot of reasons why some vendors are moving away from this architecture in favor of a general-purpose CPU. Let's take a closer look at this trend.

On a modern enterprise or data center switch, the CPU is responsible for the control plane. It runs the operating system as well as the higher order functions, like layer 3 packet lookups and routing decisions. Once these lookups are made the first time, they are programmed into forwarding tables. From this point on (until the expiration of the entry), packets entering the data plane are forwarded by an ASIC using the forwarding table stored in ternary content addressable memory (TCAM). This all happens very fast.

But switch architecture starts slowing down when too many packets must be shunted to the CPU for handling. This is known as "process switching" in Cisco terminology. It requires the CPU to deal with every packet entering the switch, which causes a lot of slowdowns; packets begin to buffer until they are eventually dropped.

ASICs remove the need for process switching once a forwarding table is built. With protocols like Cisco Express Forwarding (CEF), the OS builds a software-forwarding table to expedite fast switching for platforms that don't have switching ASICs, like routers.

So process-switched packets are bad. And having the CPU handle everything is bad -- or is it? Think for a moment about an entirely virtual switch, like VMware's vSphere Distributed Switch or Cisco's Nexus 1000v. These switches are able to forward packets without having a dedicated ASIC.

In fact, they use the same x86 CPU used by the hypervisor and the guest virtual machines on a system. All of the forwarding decisions are done with a non-ASIC table. Yes, the performance on these switches won't reach the terabit level. But they are more than capable of forwarding traffic at an acceptable rate by using a slice of the host CPU.

The idea of using a generic x86 CPU as a forwarding device is gaining traction. A research paper from Microsoft shows it's developing a switch that uses the main CPU as a traffic co-processor by building a large forwarding table for flows. This design replicates the CAM table in regular memory and uses the CPU for forwarding instead of a dedicated ASIC. Now, why would you want to do that?

The truth is that ASICs and the TCAM memory that goes into the high-speed forwarding mechanisms are expensive. TCAM chips cost several hundred dollars each.

In order to balance the cost of the switch versus the forwarding performance, manufacturers must keep the amount of TCAM chips used small. This means the forwarding tables used are only as big as the amount of TCAM used, which is a finite resource. In order to use bigger forwarding tables, you must find more memory. That's where the use of ancillary sources comes into play.

If you are going to use main system memory to hold bigger forwarding tables, it makes sense that you would use the main system CPU to help forward those packets. As fast as ASICs are, they can only forward a certain number of packets in a given interval.

By giving the forwarding plane access to idle main CPU, you can use that resource to process forwarded flows. They don't have to be high priority. Even using the CPU for processing bulk traffic would ensure that port ASICs are free to take care of important flows when they arrive.

White-box switches provide value based on their ability to deliver performance at a low cost. By analyzing the pieces of the system, we can find ways to use the entire system to help augment forwarding behavior. By reducing the amount of TCAM or the number of ASICs in a given switch, the price can be reduced without significant performance impact. As the white-box market matures, I expect we will see more vendors taking this unique approach.



A specialized CPU (ASIC) might be a good solution if the type of processing that is needed to be carried out does not fit well on a general purpose CPU (x86 architecture). However, a specialized CPU can be expensive because they don't have the same level of mass production to rely upon that general propose CPUs enjoy -- enabling a cost-benefit.

The x86, ARM, GPU and Power architectures are well known general propose CPU architectures, I wonder which of these architectures would be best suited to handle switching computational needs.

Distributed Switching

It's impressive how much the CPU can do now compared to how things used to be. Certainly at lower speeds, it seems entirely feasible to use CPU switching. As you go up the chain, it seems less plausible that you could get the level of throughput desired without some kind of ASIC in the chain. Still, the more you can take off expensive chips, the better.