Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

RAID: I'm Not Dead Yet

The other day a message showed up in my inbox with "Is RAID Dead?" in the subject line. While I understand, and agree, that expectations and advances in technology have passed simple RAID 5 by, I'm getting tired of the "XYZ Technology is Dead" meme. At various times we've been told tape, the PC, and e-mail were all dead. If calling RAID dead is an overstatement, what is the state of RAID?

While most system and storage administrators embraced parity RAID, there's always been a small, if vocal, group that objected. BAARF (Battle Against Any RAID Five) members, who seem mostly to be DBAs, have long contended that all drives should be mirrored. They say RAID 10 sets can survive multiple drive failures and are faster, so why use parity (especially since the cost doesn't come out of my budget)? The answer is that the folks who write the checks would rather have 20 to 30 percent overhead with parity RAID than 50 percent overhead for mirrors.

Reasonable arguments that the day of parity RAID are (or should be) over are based on ever-increasing projections of failed disk-rebuild times. Disk capacities have been growing faster than data transfer rates for the past 20 years or more. This is especially true for the capacity-oriented drives that typically have SATA interfaces.

In 1988, when Patterson, Gibson, and Katz codified and named RAID the "inexpensive" drive, the drive that they used as an example was a Conner Peripherals CP3100 105MByte SCSI drive. The CP3100 had a data transfer rate of 1.2MByte per second, so writing rebuild data to a hot spare would only take a few minutes. By comparison, a Seagate Constellation ES 2Tbyte drive has a sustained transfer rate of 60MBytes to 150Mbytes per second, which puts the theoretical minimum full disk write time at approximately four hours.

Even if sustained transfer rates inch up to 250Mbytes per second, as capacities hit 10TBytes per drive, rebuild times will continue to climb until they take days, not hours. One way to reduce the exposure to rebuild times is to distribute an array's data, parity, and spare capacity across a large number of drives. Xiotech, HP/3Par, Compellent, and others do so. This way, the rebuild isn't reading from the five to 10 other drives that were in the RAIDset with the failed drive and writing to a designated hot spare. Instead, all the drives in the array can be involved in moving the bottleneck from the hot spare drive to the much more capable RAID controller(s).

While a second disk failure during a rebuild is a scary thought, I worry more about unrecoverable read errors during the rebuild. Today's disk drives will, if they live up to their spec sheets, fail to read data in a sector once in about 10Tbytes to 100Tbytes. Rebuild a 5+1 RAID 5 set of 2TByte drives and you're likely to have a read failure. How your RAID controller deals with that read error will determine just how big a problem you have. With most RAID controllers, the rebuild will fail on the read error. In the worst case, it will then take the RAIDset offline and the inability to read one sector will mean you have to restore 10TBytes from backup.

In better cases, the rebuild will continue and a cryptic message will post to the system log that block 455352 of LUN 2e3da42f354f25 is unreadable. You'll find out that block is in the middle of the president's presentation to the board 10 minutes before she's going on stage when she can't read it.

I wouldn't go nearly so far as to say RAID 5 is dead, but it's off my recommended list. RAID 6 will keep our data safe enough for the time being, but double parity alone won't solve the problem when we get to 10TByte and larger drives. By then we'll need new technologies, but I'm pretty sure we'll still call it RAID. After all, if Ethernet can lose CSMA/CD and still be Ethernet, RAID can still be RAID without parity.