Comments
White-Box Switches: Are You Ready?
Newest First  |  Oldest First  |  Threaded View
<<   <   Page 2 / 2
EtherealMind
50%
50%
EtherealMind,
User Rank: Ninja
7/28/2014 | 3:12:15 PM
Re: Tom is talking about things he doesn't know about.
The initial purpose for FBOSS / Wedge was to address packet loss issues in a vendor switch at high utilisation. The value proposition was likely extended to include support for a BGP/SDN solution and removal of unnecessary code for reliability. 

Arista EOS software was extensively customised to support HFT systems and provide low-latency paths through the hardware architecture. 

Cisco NX-OS software on the Nexus 3064PQ series was extensively customised to provide low-latency features. You can find a great deal information on Cisco's website on the modification and functions. 

The performance of the ASIC is rigidly determined by the VOQ algorithm, Fabric arbritration  and a number of other functions that are controlled by software. For example, changes to the FIB in some software implementations cause packet loss during FIB table updates to the ASIC. 

Switch performance is determined by many factors and like an x86 CPU, the ASIC fabric is the most important aspect but the software and supporting hardware are critical in platform performance. The metaphor is a solid comparison when you go deeper on the microcode in the ASIC itself, the firmware and algorithms that drive many of the functions. 

It's not obvious to most networking people since we have never needed to know this before. I certainly only learned this information in the last year or so. 
DavidS327
100%
0%
DavidS327,
User Rank: Apprentice
7/28/2014 | 11:15:17 AM
Re: Tom is talking about things he doesn't know about.
My understanding of FBOSS (WEDGE) was that facebook wanted to put these switches into their server management platform and so they created a switch that is using an x86 architecture for the control plane.  I've not read anything on them needing to optimize the switch performance though maybe I've just not come across the proper information on this topic.  

Regarding Pluribus.  In todays world, we update ASIC tables with a FIB.  This FIB is based on per network rather than per flow.  Once routing has converged, there is no need to update the FIB and we have nanosecond switching.  If Tom was referring to updating the ASIC with a per Flow table (and I don't see that he was saying this) then you have to build state for each and every flow.  There is so much latency involved in this model and because of that, it is not a good fit for HFT.  They would have to optimize the hell out of it just to get today's performance.  If you are adding 100's of microseconds of latency for each flow setup then you will be out of business rather quickly in this business.  

My comments are not simplistic but industry specific which it is clear you are not in the HFT world.  Ando also note that I would not have piped up my opion had the author not made such a ludicris industry specific comment.

Though I am no server expert, your x86 architecture reference is a bad comparison.  You are mixing the complete application performance with the forwarding performance.  If we are just talking about forwarding performance, then a 10Gig NIC doing TCP offload is a function of the ASICs on the NIC and not of anything on the other side of the PCIe bus (read main CPU and memory). 

 
EtherealMind
100%
0%
EtherealMind,
User Rank: Ninja
7/28/2014 | 10:48:18 AM
Re: Tom is talking about things he doesn't know about.
The assertion about code reduction for performance gains is quite correct. For example, Facebook's FBOSS operatiing system is specifically desgined to maximise throughput by focussing on removing packet drops across the switching ASIC at very high loads. This has been common and well known problem in branded solutions at Facebook and led to the Wedge hardware/FBOSS Software. 

Pluribus Networks, and others, write their own device drivers for the silicon to improve performance for flow updates to the FIB in silicon.

The packet forwarding latency is only one measure of speed, consider TCAM table update speed, or buffer management on the VOQ,  total goodput at 90% sustained load,or many other areas besides. 

The commenters highlighting the ASIC performance might be taking an overly simplistic model of the internal architecure of a switch and making poor errors in judgement. The ASIC performance is determined by the total sum of the software that it runs, network processors, internal buffer management, and much more. 

Consider that Intel x86 server performance is determined by a combination of the operating system, bus speed, memory class and speed, network adapter, and much more. A switch is also a collection of components that determine the overall performance of the unit. 
DavidS327
100%
0%
DavidS327,
User Rank: Apprentice
7/28/2014 | 10:25:41 AM
Re: Tom is talking about things he doesn't know about.
Tom, you said" Removing unnecessary portions of software code can and does provide a performance boost to workloads.  Reducing the memory and CPU footprint of the OS leaves more power available for the data plane.  Because we aren't dealing with special-purpose CPUs here, every timeslice we give back to the system is one that can be used to push data that much faster." 

 

This is completely wrong.  The job of an ASIC is to forward packets.  The ASIC memory and processing of packets are independent of the control plane CPU and memory.  Packets should never be switched in the control plane or you will have real performance problems.  You really need to learn more about ASIC architecture and stop making statements about things you don't know about.  Nobody in HFT is customizing the operating system on a switch to get performance gains because packets don't go to the OS.  They get switches at the ASIC.  Please don't write an article and post your ccie credentials at the bottom unless you know what you are talking about.
aditshar1
50%
50%
aditshar1,
User Rank: Ninja
7/28/2014 | 10:05:28 AM
Re: Depends on IT resources
I guess white box is still in early stage of deployment and major challenge i see is product support, but its low cost and budget is attracting lot of people.
NetworkingNerd
50%
50%
NetworkingNerd,
User Rank: Apprentice
7/28/2014 | 9:57:54 AM
Re: Tom is talking about things he doesn't know about.
Serrad,

You are correct that many HFT lines of business are customizing Field Programmable Gate Arrays (FPGA) to accelerate workloads.  This is a big part of the push of companies like Arista Networks.  However, saying that customizing the software load can't provide performance gains isn't correct either.

Removing unnecessary portions of software code can and does provide a performance boost to workloads.  Reducing the memory and CPU footprint of the OS leaves more power available for the data plane.  Because we aren't dealing with special-purpose CPUs here, every timeslice we give back to the system is one that can be used to push data that much faster.

You are right that HFT really wants to use FPGAs for acceleration.  But I think the narrow focus of FPGAs and the special knowledge required to program them for their purpose won't translate well into the wider enterprise and data center market.  It's much easier to adapt existing Linux OS experience and familiarty with languages like Python to whitebox than it is to find a VHDL programmer.  Making your staff learn to program your switches is the road most want to go, not making your network beholden to another specialized consultant.

And lastly, you are correct that the CCIE doesn't teach these concepts.  There is a lot of outside research that goes into new technologies like SDN and whitebox switching.  It's important for today's networking engineers to keep on top of the changing landscape.  A friend once told me that a CCIE doesn't mean you know everything, but instead means you can learn new things.  That's the kind of approach that networking needs today.
Serrad
100%
0%
Serrad,
User Rank: Strategist
7/28/2014 | 9:26:18 AM
Tom is talking about things he doesn't know about.
Tom, where are you coming up with "The ability to heavily customize the operating system to provide high performance is very important for some lines of business, such as financial trading".  No HFT traders are customizing anything in the control plane.  the customization is happening at the hardware layer (read FPGA) if anything.   There is no "performance" gain to be had by programing the OS of the switch (asside from routing/switching convergence but maybe).  I think you need to go back and study more because your ccie didn't teach you this.

 
Pablo Valerio
100%
0%
Pablo Valerio,
User Rank: Author
7/28/2014 | 8:59:56 AM
Depends on IT resources
I can see why White-Box products could be attractive to large organizations who have the staff to properly install and support the switches. 

But for SMEs lack of professional support could come at a high price, and they are better off buying from a brand name vendor.
<<   <   Page 2 / 2


Slideshows
Cartoon
Audio Interviews
Archived Audio Interviews
Jeremy Schulman, founder of Schprockits, a network automation startup operating in stealth mode, joins us to explore whether networking professionals all need to learn programming in order to remain employed.
White Papers
Register for Network Computing Newsletters
Current Issue
Research: 2014 State of the Data Center
Research: 2014 State of the Data Center
Our latest survey shows growing demand, fixed budgets, and good reason why resellers and vendors must fight to remain relevant. One thing's for sure: The data center is poised for a wild ride, and no one wants to be left behind.
Video
Twitter Feed