On Large Drives And RAID

Xyratex's recent announcement that they've qualified Hitachi's 2 TB nearline drive for their disk systems got me thinking about how the RAID techniques of the past don't really address the needs of systems with many ginormous drives. As drives get bigger I worry that basic RAID-5 protection isn't sufficient for these beasts.

Howard Marks

August 24, 2009

3 Min Read
Network Computing logo

Xyratex's recent announcement that they've qualified Hitachi's 2 TB nearline drive for their disk systems got me thinking about how the RAID techniques of the past don't really address the needs of systems with many ginormous drives. As drives get bigger I worry that basic RAID-5 protection isn't sufficient for these beasts. 

For a company that isn't a household name in even the geekest of households, Xyratex plays a strategic role in the storage industry. Many of the big names in the business, most significantly former parent IBM, OEM Xyratex RAID arrays as their low to midrange products.  Even more vendors use Xyratex as a supplier of JBODs and SBODs or a contract manufacturer. We should start seeing 2TB drives in arrays from better known vendors in the past few months.

My concerns aren't based on the quality or failure rate of big drives, but the time it must take to rebuild from a hot spare after a drive failure. Just as scary is the quantity of data that has to be read, processed and written to the replacement drive is comparable to the drive's error rate.

In the few short years that capacity oriented drives, mostly but not necessarily with SATA interfaces, have worked their way into the data center, their capacity increased eightfold while their throughput has barely doubled. The ~130MB/s sustained data transfer rate that 1 and 2TB drives deliver is sufficient for the backup and archiving applications enterprises.

However, even if a RAID controller could rebuild a failed drive at 130MB/s, it would take over 4 hours to rebuild a 2TB drive. In the real world, I'd expect it to take at least 12 hours, even longer if the array is busy, since rebuilding is a lower priority task.

With an MTBF of 1.2 million hours, one could be lulled into a false sense of security by calculating that the  probability of 2 drives in the 5-20 in a RAID set failing is somewhat lower than that of winning the Publisher's Clearinghouse Sweepstakes. Someone wins the sweepstakes every year.  Drive failures come in bunches because the environmental problems, either in manufacturing or in deployments, that cause drive failures effect not just one drive but often a whole array or data center.Systems like 3Par and Xiotech's Emprise that virtualize RAID data and spare space across all their drives should rebuild a big drive faster. Using 2TB of spare space across many drives rather than a spare drive eliminates the bottleneck at the single spare.

Even if another drive doesn't fail, the rebuild may fail when it can't read a sector from one of the remaining drives in the array. A 2TB drive holds 1.6x10^13 bits and has a published error rate of 1 in 10^15 bits. Rebuilding 1 4+1 RAID-5 array means reading and/or writing 8x10^13 bits resulting in an 8% chance an error will occur during the rebuild. That would result in either a rebuild failure or a message in the array logs that block 32322 of LUN 2232. This is bad depending on how gracefully the controller handles the error.

Switching to mirroring reduces the odds of data loss to 1.6% per rebuild but that's not good enough.

The moral of the story, dear readers, is big drives require some sort of multiple parity or RAID-6 system, especially if you're using them to hold deduped data that could affect many data objects.

About the Author(s)

Howard Marks

Network Computing Blogger

Howard Marks</strong>&nbsp;is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.</p><p>He has been a frequent contributor to <em>Network Computing</em>&nbsp;and&nbsp;<em>InformationWeek</em>&nbsp;since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of&nbsp;<em>Networking Windows</em>&nbsp;and co-author of&nbsp;<em>Windows NT Unleashed</em>&nbsp;(Sams).</p><p>He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders.&nbsp; You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights