Well, after I reviewed some real-world CNA test results last week, I started thinking that perhaps what Emulex meant by "fully realize the capacity of the new Intel Xeon-based servers" was to choke the CPU so that it had no more room for additional functions. Before we review the test results, let's take a look at the test configuration that was used. The storage units under test were a Texas Memory RAMSAN 325 and RAMSAN 400 connected to a Cisco Nexus 5020 switch and the switch then connected to FCoE adapters in the server. The sever used was a 2.8GHz Nehalem Dual Socket Quad Core processor with 24GB of RAM and PCIe connectivity. The performance analysis tool used was IOmeter.
From an overall test environment perspective, standard, generally available FCoE converged network adapters were used for performance testing. The products under test used the most current software and firmware levels available at the time the tests were run, and the test environment was well defined, documented and can be recreated. In my opinion, the test environment was designed to provide an unconstrained adapter performance analysis and assessment. Tests were run with block sizes ranging from as miniscule as 0.5KB to 1020KB and on single port and dual port FCoE CNA configurations.
The test results from an input/output per second (IOPs) perspective reveal that in some cases the QLogic 8152 performs at higher IOPs than the Emulex UCNA, and in other cases the Emulex UCNA performs at higher IOPs than the QLogic 8152. Also, from a megabytes per/second perspective. the results were similar. However, there was a striking difference in the percentage of CPU utilization for the Emulex UCNA, especially in block sizes ranging from 0.5KB to 8KB and 16KB in both sequential read and sequential write testing.
In one test case, the CPU percent utilization for the Emulex UCNA was a whopping 80 percent. This case was for sequential reads at 0.5KB block size, with 247K IOPs on QLogic; 841K IOPs on the Emulex UCNA, and a concerning 80.66 CPU percent utilization for the Emulex UCNA vs. an 11.16 CPU percent utilization for QLogic. In fact, I understand that in some tests the Emulex UCNA drove CPU utilization north of a staggering 90 percent. Now we know why Emulex didn't talk about CPU utilization when it was broadcasting 1M IOPS--the performance was extracted at the expense of the CPU and using a block size so small that it bears no resemblance to a real-world workload. And in real-world block sizes (4K, 8K), the performance of the products is similar with Emulex's UCNA still exhibiting much higher CPU overhead.
I asked the folks running the tests to take a closer look at why the CPU utilization would be so high in the Emulex case vs. the lower numbers for the QLogic CNA. After a careful review of the Ipfc upstream driver, it appears that there is a significant difference in the implementation in the Emulex Fibre Channel driver for the 4/8Gb/sec Fibre Channel adapter and its UCNA. It also appears relevant functions are now offloaded to the driver as opposed to being located in firmware on the UCNA adapter card itself. This explains why the server CPU would be so busy processing I/O. This led me to thinking about, "What's the big deal with this high level of CPU percent utilization?" Well, I believe, for starters, that users will expect vendors to use the same proven stack on FCoE adapters as has been used on more traditional Fibre Channel adapters as we move forward with this new era of connectivity. It is unclear how all this will affect the many operating systems that new FCoE cards need to be deployed within and how that will effect overall server performance over time.