It's Not The Same Old Block Storage

As recently as the turn of the century, block storage was pretty simple. A controller joined a group of disk drives into a RAID set and then offered fixed slices of that RAID set up as logical volume or LUN. All the controller had to do to map a block in the volume to a physical location was calculate offset from the beginning of the RAIDset to the beginning of the volume and the RAID stripe. With features like thin provisioning, automatic tiering and data reduction via deduplication and compres

Howard Marks

September 24, 2010

2 Min Read
Network Computing logo

As recently as the turn of the century, block storage was pretty simple. A controller joined a group of disk drives into a RAID set and then offered fixed slices of that RAID set up as logical volume or LUN. All the controller had to do to map a block in the volume to a physical location was calculate offset from the beginning of the RAIDset to the beginning of the volume and the RAID stripe. With features like thin provisioning, automatic tiering and data reduction via deduplication and compression, things aren't that simple.

When you hear the block storage vendors use lines like "Unified storage systems have all that file system overhead because they store iSCSI or Fibre Channel LUNs as files" they're underestimating the file system metadata that today's block storage arrays have to track.  

To run any of the new RAID+ features, an array controller needs a metadata base that looks a lot like a file system to me. Let's take the case of data deduplication. The array has to break data into chunks of 4K-1MB, figure out which of those chunks store the same data and then build a pointer list mapping LUN logical block addresses to the stored chunks.

Automated tiering requires even more metadata as the system has to track the logical block addresses to chunks that are on different RAID sets on different types of storage. In addition to figuring out what is the hot data that should be promoted to a faster tier of storage and what are the cool chunks that can be demoted, the system has to collect access frequency metadata.

What does this mean for a poor storage admin? Well, first of all, it explains why disk arrays need bigger processors to deliver good performance. Your vendor may not be blowing smoke when he says you can't run tiering or compression on your old system because it doesn't have the horsepower.  It also means vendors that designed their systems to use chunks for data protection mapping five data chunks and one parity chunk on some drives rather than assigning drives to RAID sets like Compellent, 3Par and to some extent HP's EVA, have the inside track over conventional designs that need abstraction layers added to do thin, etc.

But the biggest impact is that on a wide stripped, deduped, thin provisioned, auto-tiered array there is no such thing as sequential I/O. Your data isn't on the same track of the drives in the RAID set, it's where ever the array controller decided was the best place for it. Cache, flash and smart software allow the system to deliver data as if it were sequential but old tools like defragmenting a Windows volume aren't doing any good any more.  

Welcome to the 21st century. Please keep your hands in the car at all times. "The way we've always done it" isn't good enough any more.

About the Author(s)

Howard Marks

Network Computing Blogger

Howard Marks</strong>&nbsp;is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.</p><p>He has been a frequent contributor to <em>Network Computing</em>&nbsp;and&nbsp;<em>InformationWeek</em>&nbsp;since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of&nbsp;<em>Networking Windows</em>&nbsp;and co-author of&nbsp;<em>Windows NT Unleashed</em>&nbsp;(Sams).</p><p>He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders.&nbsp; You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights