File I/O stacks in operating systems are a great example for schools to teach Parkinson’s Law, which says that software expands to use the resources available. Last time I counted, the current Linux stack with an attached SSD has a total of six or seven address translations. That’s a huge software overhead, and you might be forgiven for wondering if the objective of the fastest I/O got lost in translation!
Fortunately, relief is at hand. The industry has created a new standard that enterprise SSD vendors are aggressively picking up: NVMe. The idea seems so simple that you wonder why we didn’t go there before, but IT is often very conservative, and this is a radical change.
NVMe stands for “Non-Volatile Memory express.” The hardware part of the solution takes away the disk interface chip that used to plug into PCIe on the motherboard, and this allows the drive to talk directly to the PCIe bus and so to system memory. This idea isn’t quite as radical as it sounds. Electrically, PCIe and SAS/SATA are very similar, so it's more a protocol change than a completely new concept.
That protocol change is a major rethink of storage I/O handling. First, the SCSI stack is replaced by a circular queue system; I/O jobs are added by the host to one of the queues, and the drive takes queue entries out as it becomes ready to process them. The drive does all the heavy lifting, moving data using DMA, and using a queue system to post status.
Interrupts, which reach horrendous levels for traditionally connected high-end SSD, drop dramatically due to a consolidation mechanism. Adding all this up, I/Os proceed faster because all those translations move to the drive and are consolidated, The I/O stack is greatly simplified and shortened and there fewer interrupts. The result is a lower system load and faster I/O operation, and a lot more headroom for larger configurations of drives.
Recently, the University of New Hampshire InterOperability Lab ran a plug-fest to demonstrate interoperability of drives and hosts. The results were excellent, and justify claims that NVMe has arrived. Vendors are announcing products for both systems and drives, and we can expect a flurry of activity throughout 2014.
From a use viewpoint, these systems provide multi-use drive bays. The industry has defined a connector scheme that supports SAS. SATA and NVMe, with software-defined port protocols offered by the motherboard I/O chipsets. This gives us hot-swap removable drive bays that support NVMe, and it’s just like any other drive in that respect.
[Read how SanDisk's UltraDIMM technology and Viking's ArxCis meld flash with DRAM for super-fast performance in "Hybrid DIMMs And The Quest For Speed."]
At least for a while, NVMe drives will have a price premium reflecting their performance, but the overall benefit will make it worthwhile. Typical use cases, such as big data analytics and in-memory databases, need every bit of performance available.
Are there any drawbacks? Cabling is limited to a few inches, so usage is only within servers; that could change in a year or so, with the discussion of PCIe extenders and switches. However, the flexibility and performance of RDMA over Ethernet, and the existing infrastructure industry, will probably make the latter the out-of-the-box winner.
NVMe will radically change the top tier of solid-state storage, and will do so quickly. All of the major operating systems have drivers and support for it, and drives and systems are available from the major players. This is as radical a change as Fibre Channel in the 90s and SCSI in the 80s, and we can expect significant expansion of the approach going forward.