What Comes After RAID? Erasure Codes
December 16, 2010
As I mentioned a few blog entries ago, the basic math behind parity based RAID (Redundant Array of Inexpensive Drives) solutions is starting to break down. While I think it's important for those of us that spend our days thinking about these things to raise the alarm, it's more important to think and write about the technologies that can take us past parity RAID. One major contender is Reed-Solomon erasure codes, which vendors are starting to use as an efficient alternative to parity or mirroring.
As we've previously discussed, the problem with parity RAID is that disk drives have been getting bigger much faster than they've been getting faster or more reliable. In just a few years, we'll be buying 10TB (1x10^13 bits) drives that will take 10 hours or more to read end to end, pushing a RAID rebuild into a several day event with a high probability of failure due to a read error on the other drives.
Reed-Solomon erasure codes (which got their name from their original use as a forward correction method for sending data over an unreliable channel which may fail to transfer, or erase, some data) can extend the data protection model from RAID-5's simplistic n+1 to substantially higher levels of protection. Rather than separating the data from error correction or check data as parity and CRCs do, erasure codes expand the data, adding redundancy so even if a portion of the data is mangled or lost, the original data can be retrieved from the remaining portion.
Erasure codes have been around for decades, for applications like data transmissions from deep space probes, where the several minutes of latency makes a TCP style timeout and retransmit impractical, to CDs and DVDs which use erasure codes to handle dust, scratches and other impairments of the vulnerable disk. They've even made it quietly into enterprise storage as vendors use Reed-Solomon math to calculate the ECC (Error Correcting Codes) for their n+2 RAID-6 implementations.
Erasure codes get really interesting, however, when we up the ante beyond n+2 as several vendors have. NEC's HydraStor deduplicating grid system uses erasure codes to spread each data chunk across twelve disk drives in the grid. With a protection level of 9 of 12, the original data can be reconstructed from any nine of the twelve data chunks. Hydrastore users can set the protection level as high as 6 of 12 which would have the same 50% overhead as mirroring, but be able to deliver the data after six drive failures.
Cleversafe has extended erasure coding to add location information, creating what they call dispersal coding. This lets them insure that blocks are stored, not just on different disk drives or different nodes of their RAIN cluster, but even in different data centers. Using their default coding, which requires 10 of the 16 chunks created for each data stripe in order to reconstruct the original data, you can tell their system to disperse the data across three data centers and be able to read the data if one data center went off line with less than 40% overhead. A typical replicated solution in three datacenters would have 200% overhead.