If we look at storage appliances in the broad view, including traditional RAID arrays as well as new universal stores, the transition to all solid-state-drive (SSD) technology that will occur over the next couple of years has profound implications. Changing hard drives to SSDs isn’t just a case of plugging one SATA drive in place of another.
The best analogy is upgrading from a golf cart to a Ferrari -- the driving rules and how and where you drive have to change or you’ll never get the Ferrari out of first gear! Consequently, storage vendors offering all-SSD appliances need to address performance issues with networking, the controller boards and software.
In the traditional array, the drive controllers are designed around slow hard drives. Latencies are vastly better for SSDs; the number of random IO operations they can deliver is so much higher so that a typical SSD can outperform a 60-drive HDD array. This is why we see “hybrid” arrays. Sold as the “right balance of drive types” these are in reality an acknowledgement that the storage array can’t do any better. The controllers in hybrid arrays are designed for perhaps 750K IOPS aggregated across the drives and a couple of SSDs can achieve that by themselves.
Another issue with RAID arrays is the calculation of parity at very high-speeds. RAID 5 is essentially dead, due to the rebuild time after a failure, and the problem remains even when SSDs are used in a hybrid array. RAID 6, its replacement, takes far too much horsepower to calculate the parity syndromes even with HDD and would drastically slow down SSDs.
The latest storage appliance concept, universal storage based on Ceph, Scality Ring or OpenStack, delivers file, object or block-IO based on the one underlying data store. Because the approach is new, it’s taking advantage of the availability of low-cost COTS hardware platforms to create storage appliances that typically have 10 or 12 drives in a compact cabinet.
Universal storage is a more Lego-like approach from cost-efficient Coho Data to products such as DDN’s WOS appliances which target heavy featuring and high performance. Smaller drive-to-controller ratios fit the SSD throughput level much better, since drives are typically directly connected to a motherboard without a RAID controller creating a bottleneck. Data integrity is normally implemented at an appliance level rather than at the traditional drive level, so RAID is not used for long-term storage. (RAID 1 mirroring IS used on journal files used to speed up server write operations -- mirroring is fast and can easily be implemented in software.)
Long-term, integrity is moving to erasure coding, which is implemented on secondary HDD storage either in the appliance or in attached JBODs. Erasure coding is compute-intensive, and currently not fast enough to keep up with SSD, but the two-tier model of fast SSD primary storage and slow HDD secondary storage works well if data is staged from SSD to HDD as a background job.
If we replace all the HDDs with SSDs, what happens to these storage appliances? Clearly, RAID arrays struggling with two SSDs won’t cope with 60 drives, never mind the hundreds on large array configurations. Interfaces won’t carry the load and controllers won’t keep up.
Even the new compact appliances are in trouble. Twelve SSDs at 500K IOPS each makes 6G IOPS. Designs will need more, and faster, interfaces. Fortunately, some help is at hand. We will have 25G and 100G Ethernet interfaces to replace the current 10G and 40G connections. But those aren't enough to cope with all the added load.
At the IO rates we are going to see, interrupt handling and task switching, both in the appliance and the server, will be an issue. This points to NVMe over Fabrics as the probable choice of protocols for drives, coupled with RDMA transfers to reduce overhead and latency.
Erasure coding with all-SSD will be a major problem. Instead of calculating syndromes over slow HDD, we’ve seriously upped the ante with SSDs. A technology step is sorely needed to bring erasure coding to the SSD world. We’ll likely see hardware solutions, using FPGAs to accelerate calculation, but early signs are that they won’t be fast enough. This remains a serious obstacle for the storage industry in moving from the past.
There has been some discussion about taking controllers out of the picture altogether and using Ethernet drives to connect directly to servers. This asymmetric pooling approach has tremendous advantages, since it removes a non-RDMA SATA transmission, all the host/controller overhead, and multiple retransmissions of data from the networks.
NVMe over Fabrics and RDMA are still compatible with this scheme, which puts the software services of storage into the server farm. This is very much a software-defined storage (SDS) approach, incidentally. It looks like both Ceph and Scality could migrate quickly to support this model, since both are very modular in structure.
This still leaves erasure coding to be dealt with. Now, in the SDS model, the servers must generate the codes, so we need to identify the best way to do this. Many CPU cores could handle the issue, but at a substantial cost to total available CPU power in the server pool. Whichever way we go, this will be a major creative challenge for the industry.