Is Intel's Romley for You? Getting the Most Out of the Xeon E5-2600

Intel's Xeon E5-2600 supports PCIe 3.0. The increase in performance can substantial, but it can also be lost if I/O peripherals aren't up to the task. Here are the details on the new Romley chips and how to best take advantage of them.

Frank Berry

July 30, 2012

5 Min Read
Network Computing logo

Romley has been one of the most anticipated server platforms from Intel in many years. Romley is Intel's codename for the server platform combining the Sandy Bridge-EP CPU and Patsburg Platform Controller Hub chipset. Finally, on March 6, Intel rolled out a major product launch with a focus on the CPU and its official name--the Xeon E5-2600.

Designed for cloud, enterprise and high-performance computing (HPC) server applications, the Xeon E5-2600 family of processors effectively replaces the Xeon 5500 and 5600 processors by delivering more processing power, cache, memory addressing and I/O bus bandwidth. The Xeon E5-2600 betters the 5600 by adding two more cores, 8 Mbytes more cache, support for six more DIMMs of faster DDR3-1600 memory (increasing total memory capacity to 768 Gbytes), double the I/O bandwidth with PCIe 3.0, and more Intel QuickPath links between processors.

On top of all that, Xeon E5-2600 processors consume less power and I/O latency is lower. With the I/O hub integrated into the processor, high-bandwidth, low-latency I/O is now free with any server using the chip. With all this, the Xeon E5-2600 drives a new level of server performance.

To fully exploit the capabilities of servers with the Xeon E5-2600, a broad ecosystem of products surrounding the powerful new processor must undergo a technology refresh, and server adapters is a segment of the ecosystem most affected by new Intel processor technology.

Before Romley, HPC was the exclusive domain of high-bandwidth, low-latency server I/O, like InfiniBand and purpose-built Ethernet adaptors. Starting this year, Xeon E5-2600 processors will drive the need for ever-higher bandwidth and ever-lower latency into enterprise environments. In this new era, application servers from Main St. to Wall St. will be configured for specific levels of bandwidth and latency.

To keep pace with new processors, the HPC server adapter industry continues to evolve. At the turn of the millennium, 1-Gbit Ethernet emerged to replace 100-Mbit Ethernet for high-performance server connectivity to networks. Around 2006, 10-Gbit Ethernet technology appeared in the core of enterprise networks and as an HPC cluster interconnect. By the end of 2012, server adapters with 40-GbE ports will emerge, followed by the availability of server adapters with 100-GbE ports by 2018. From 2000 to 2018, server adapter latency for HPC applications will be cut in half approximately every 12 years. The baseline for HPC-class server connectivity in the Romley era is now 10 Gbps of bandwidth and 2 microseconds of latency.The enterprise server adapter industry also continues to evolve. Lagging the adoption in high-performance computing servers, the adoption of 10-Gbit Ethernet (10,000 Mbps) technology in enterprise servers did not begin to take off until 2010. In 2012, 10-GbE server adapter adoption will make a quantum step forward, fueled by an industry-wide initiative to put 10-GbE ports on the motherboard of Romley-based servers. Between 2012 and 2018, 40-Gbit and 100-Gbit Ethernet adapters will become available, with server adapter latency for enterprise computing applications cut in half approximately every six years. The baseline for enterprise-class server connectivity in the Romley era is 10 Gbps of bandwidth and 4 microseconds of latency.

IT organizations looking to exploit the power of Romley-based servers might consider a new class of 10-GbE server adapters with the high message rates and low latency needed in scale-out computing environments, including high-frequency trading, HPC, cloud computing, storage and virtualized data centers. These adapters combine ASIC technology with application acceleration middleware that bypasses the OS kernel and provides fast protocol processing to fully leverage the potential of the Xeon E5-2600.

Unlike 10-GbE server adapters using standard NIC drivers, kernel-bypass middleware allows a NIC to bypass the overhead associated with kernel and networking stacks, and frees precious CPU resources to crunch on workloads. When compared with the performance of the previous-generation Intel Westmere-based servers, these 10-GbE PCIe 2.0 server adapters running on the newer Intel Xeon E5-2600-based servers with Intel Distributed Discrete I/O technology can reduce latencies by 500ns to 600ns for ping-pong like traffic, deliver low latencies at increasing packet rates and increase multistream packet rate by five times, or up to 20 million packets per second.

These 10-GbE server adapters also provide lower jitter, and for applications such as algorithmic trading and trade execution, the amount of jitter is a highly important performance metric. The term jitter describes the variability over time of the packet latency across a network. A network with constant latency has no variation (or jitter). Packet jitter is expressed as an average of the deviation from the network mean latency. One reference for how well these 10-GbE server adapters perform in real world workloads is the Securities Technology Analysis Center (STAC) audited testing. STAC is a vendor-neutral specialist in creating tests that helps vendors and customers understand the performance of different products working in combination on securities workloads. The STAC-M2 tests describe how many messages were passed, how fast they were passed and the deviation (jitter) observed, making it possible to compare products based on latency and jitter. The following definitions describe what the test results mean:

Highest supply rate: The maximum number of messages sent per second without causing congestion in the message engine.

Mean: Reflects the arithmetic mean and simply represents the average value for every message sent during the three-minute test cycle.

Max: Represents the highest value measured during the test cycle, usually signifying the worst-case message completion at a given load.

Standard deviation: The deviation is a particularly important measure because it signifies how predictable (or deterministic) the traffic is across the environment. A low deviation implies that almost all traffic will complete close to the average, or mean, value.

The availability of the Xeon E5-2600 brings high-performance computing to the masses. Over time, the Xeon E5-2600 will also transform the ecosystem around it into high-performance server, storage and networking products for the masses. To take advantage of the processing power of it for such tasks, IT managers should consider deploying 10-GbE server adapters with TCP offload and kernel-bypass middleware.

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights