Block I/O has been the mainstay of computer storage for the best part of five decades. In that time, we’ve seen error-detection codes used to fix block corruption, SCSI invented as a way to use drives with much higher defect counts, and redundant array of inexpensive disks (RAID) as a way to nail drive failures at the macroscopic level.
All of these storage technologies deserve a place in an IT Hall of Fame. They each allowed the storage industry to move to a new plateau of stability. But simple error-correction codes no longer correct errors on drives; we’ve moved on from SCSI to newer interfaces (though SCSI still forms the background for most of them) and now RAID is moving into its twilight stage.
RAID uses either mirroring or parity blocks to protect data integrity. Basic RAID is designed to tolerate a single drive failure, but drive densities have reached the point where the window to rebuild a failed drive is so long that another failure could occur and cause data to be lost.
More advanced RAID solutions have attempted to solve the problem, but there are hardware costs and much longer write times involved. Worse, the added protection means that two drives can fail, but since both are on the same appliance, a relatively long loss of access to data is unavoidable. RAID 6, as this version is called, also mandates that all drives had to be replaceable, which adds a lot of cost.
The arrival of very fast SSD on the scene changed the picture even more. Even single-parity RAID 5 slows writing to SSD drives to an unacceptable level, and RAID 6 is slower than molasses. Clearly, something had to give, and new integrity systems such as replication and erasure coding have offered a way forward.
Replication uses multiple copies of a data object to provide protection. The subtle difference to RAID is that the copies are on different storage appliances, and may even be geographically dispersed. That protects against appliance failures, fire and flood, and major power outages, among other things.
Erasure coding uses a sort of “super-RAID” to add extra blocks to a data set. These blocks can also be dispersed over a number of appliances and the resulting protection can allow three or more drives or appliances to crash before data is unavailable. One common form adds six extra blocks to every 10 of data; this allows up to six failures to occur before the data is unavailable. That’s really powerful protection!
Just as importantly, the new replication approaches don’t need super-expensive RAID arrays to achieve results. Most of the RAID vendors have charged whopping amounts for each drive; the advent of the cloud approach meant that cloud service providers like Google and Amazon had to look for alternatives.
Replication and cheap commodity drives proved to be the answer. Inexpensive consumer drives worked really well in replicated storage systems. The world didn’t end because of any drive durability issues, which turned out to have been overstated.
Taken together, low-cost drives, simple controllers using COTS servers as head nodes, and a much lower price point are making RAID arrays look like IBM’s mainframes did in the 1980s. We are seeing a UNIX-like revolution in storage, with inexpensive COTS-based gear taking major market share from traditional vendors.
The revolution is only just beginning. The cloud has opened the door for very low-margin Chinese vendors to take the same machines they’ve been delivering in the millions to AWS and Google and sell them to the commercial marketplace.
But even these ODMs are caught up in a tsunami of change. We are beginning to evolve to a software-defined storage model where the hardware is somewhat undifferentiated commodity gear, while the configuration and differentiation migrates into software.
The combination of cheap hardware and flexible software means the end of the road for simple, staid, but slow RAID. It won’t happen overnight -- there are still mainframes 30 years after UNIX's arrival -- but RAID development will stop and existing gear will move to a secondary role.
It’s worth noting that there are some subtle differences in the storage revolt compared with the UNIX revolution. SSDs will make hard drives obsolete in just a couple more years, and mainframe hard-disk drive RAID arrays will be like two-horse carriages in a land of Ferraris. The mainframe never faced that level of obsolescence. Likely, the demise of the RAID array will be quick.
Still, EMC Chairman and CEO Joe Tucci will be retiring just before RAID's decline accelerates. EMC went years without permanently losing a bit of data. That’s a superb record and a fitting swan song for the RAID array!