BufferBloat And The Collapse Of The Internet

It seems that every few years there's yet another prognosticator that the Internet is about to collapse. Once it was the stellar growth in bandwidth demand driven by the phenomenal increase in Internet-connected devices. At other times, it was the lack of Net neutrality (see this video). Still other times, it was sinister attacks on BGP or the fact that we've run out of IPv4 addresses.

David Greenfield

April 21, 2011

3 Min Read

So, here we are again, and yet another problem has been unearthed that could dismantle the Net. It's BufferBloat, and while the hype might be hot, the challenge is very real. So much so that it's got some of the biggest guns in Internet research today thinking about the issue. It's got me thinking about an often overlooked buying criteria that WAN optimization buyers need to be watching.

The chief preacher of the BufferBloat gospel is Jim Gettys, a researcher at Alcatel-Lucent Bell Labs. Gettys presented his findings to the IETF last month and talked about how he identified the problem. Gettys first noticed the issue at home when he was seeing extremely high latency, despite the fact that he was on a high-speed Internet connection.

After extensive research, he found that packets were being stored in various devices in his path--in transmit queues often used for traffic classification, in ring buffers within device drivers, and in some cases in end notes. (He points to the Marvell One Laptop Per Child system, for example, that was found to hold four packets.) As packets were queued, latency mounted. To put that in perspective, if 256 packets were buffered on a 10Mbps line, that would equate to 3 million bits or a third of a second of delay.

But delay is only part of the story. As Gettys showed, the added latency disrupts higher layer protocols. TCP's Round Trip Time (RTT) estimator, for example, can't determine the correct bandwidth to send data. TCP ends up trying to send too much data, so buffers fill up, increasing latency. You can read a detailed account of Gettys' efforts here.Over time, IT managers can expect the vendor community to take a proactive stance on Gettys' suppositions--if they hold true. It's reasonable to assume that over the next few years, best practices will emerge and BufferBloat may be conquered. But what about installed networking equipment?

In purchasing a WAN optimizer, IT managers should pay attention to the delay imposed by the device on a given packet. And I don't just mean optimizing TCP or perhaps looking at how quickly a WAN optimizer can return an object from cache. I'm speaking about looking at the various operations performed by the WAN optimizer--the compression, the encryption, the deduplication and more--and getting an accurate read on just how much latency is introduced in traversing the system. This will give you a sense of the BufferBloat of that system, a concept Gartner calls "insertion latency".

I know there are several ways to perform WAN deduplication, for example. Some include tokens, others include start/stop instructions. Do some of these methods add more insertion latency than others? What impact does this have on the types of applications that can be optimized? I would think 10 ms or less would be required to support real-time traffic like voice and VDI. I'd welcome voices from the vendor community who are willing to share with us what their delay looks like.

Then there's the more fundamental issue of preventing the packet loss that's contributing to the effect of BufferBloat. The importance of packet loss is a hot area contested by Silver Peak, Riverbed and more, but increasingly it seems that no sane IT manager can ignore the effects of loss on their WANs. Addressing the issues of loss in the near and long terms are essential for network functioning. Those aren't just my words. Go ask Gettys.