What Comes After RAID? Erasure Codes

As I mentioned a few blog entries ago, the basic math behind parity based RAID (Redundant Array of Inexpensive Drives) solutions is starting to break down. While I think it's important for those of us that spend our days thinking about these things to raise the alarm, it's more important to think and write about the technologies that can take us past parity RAID. One major contender is Reed-Solomon erasure codes, which vendors are starting to use as an efficient alternative to parity or mirroring.

As we've previously discussed, the problem with parity RAID is that disk drives have been getting bigger much faster than they've been getting faster or more reliable. In just a few years, we'll be buying 10TB (1x10^13 bits) drives that will take 10 hours or more to read end to end, pushing a RAID rebuild into a several day event with a high probability of failure due to a read error on the other drives.

Reed-Solomon erasure codes (which got their name from their original use as a forward correction method for sending data over an unreliable channel which may fail to transfer, or erase, some data) can extend the data protection model from RAID-5's simplistic n+1 to substantially higher levels of protection. Rather than separating the data from error correction or check data as parity and CRCs do, erasure codes expand the data, adding redundancy so even if a portion of the data is mangled or lost, the original data can be retrieved from the remaining portion.

Erasure codes have been around for decades, for applications like data transmissions from deep space probes, where the several minutes of latency makes a TCP style timeout and retransmit impractical, to CDs and DVDs which use erasure codes to handle dust, scratches and other impairments of the vulnerable disk. They've even made it quietly into enterprise storage as vendors use Reed-Solomon math to calculate the ECC (Error Correcting Codes) for their n+2 RAID-6 implementations.

Erasure codes get really interesting, however, when we up the ante beyond n+2 as several vendors have. NEC's HydraStor deduplicating grid system uses erasure codes to spread each data chunk across twelve disk drives in the grid. With a protection level of 9 of 12, the original data can be reconstructed from any nine of the twelve data chunks. Hydrastore users can set the protection level as high as 6 of 12 which would have the same 50% overhead as mirroring, but be able to deliver the data after six drive failures.

Cleversafe has extended erasure coding to add location information, creating what they call dispersal coding. This lets them insure that blocks are stored, not just on different disk drives or different nodes of their RAIN cluster, but even in different data centers. Using their default coding, which requires 10 of the 16 chunks created for each data stripe in order to reconstruct the original data, you can tell their system to disperse the data across three data centers and be able to read the data if one data center went off line with less than 40% overhead. A typical replicated solution in three datacenters would have 200% overhead.

Juniper Networks Announces AI-Native Networking Platform

Zeus Kerravala, Founder and Principal Analyst with ZK Research

January 31, 2024

Bob Friday, Chief AI Officer for Juniper Networks, explains how the advanced technology is transforming operations.

Understanding Why Contact Center Agent Empowerment is Critical to a Great Customer Experience

Zeus Kerravala, Founder and Principal Analyst with ZK Research

January 29, 2024

Contact center leaders from 8x8, Awaken Intelligence, and 360insight discuss the importance of agent experience.

AI Drives the Ethernet and InfiniBand Switch Market

David Curry, Technology Writer

January 27, 2024

AI may force enterprises to rewire parts of their data centers so they are fully optimized to run such workloads. The question is do you use Ethernet or InfiniBand?

What Comes After RAID? Erasure Codes

Tags:

Recommended For You

Juniper Networks Announces AI-Native Networking Platform

Understanding Why Contact Center Agent Empowerment is Critical to a Great Customer Experience

AI Drives the Ethernet and InfiniBand Switch Market

Search form

What Comes After RAID? Erasure Codes

Tags:

Recommended For You

Juniper Networks Announces AI-Native Networking Platform

Understanding Why Contact Center Agent Empowerment is Critical to a Great Customer Experience

AI Drives the Ethernet and InfiniBand Switch Market