It's Not The Same Old Block Storage
As recently as the turn of the century, block storage was pretty simple. A controller joined a group of disk drives into a RAID set and then offered fixed slices of that RAID set up as logical volume or LUN. All the controller had to do to map a block in the volume to a physical location was calculate offset from the beginning of the RAIDset to the beginning of the volume and the RAID stripe. With features like thin provisioning, automatic tiering and data reduction via deduplication and compres
September 24, 2010
As recently as the turn of the century, block storage was pretty simple. A controller joined a group of disk drives into a RAID set and then offered fixed slices of that RAID set up as logical volume or LUN. All the controller had to do to map a block in the volume to a physical location was calculate offset from the beginning of the RAIDset to the beginning of the volume and the RAID stripe. With features like thin provisioning, automatic tiering and data reduction via deduplication and compression, things aren't that simple.
When you hear the block storage vendors use lines like "Unified storage systems have all that file system overhead because they store iSCSI or Fibre Channel LUNs as files" they're underestimating the file system metadata that today's block storage arrays have to track.
To run any of the new RAID+ features, an array controller needs a metadata base that looks a lot like a file system to me. Let's take the case of data deduplication. The array has to break data into chunks of 4K-1MB, figure out which of those chunks store the same data and then build a pointer list mapping LUN logical block addresses to the stored chunks.
Automated tiering requires even more metadata as the system has to track the logical block addresses to chunks that are on different RAID sets on different types of storage. In addition to figuring out what is the hot data that should be promoted to a faster tier of storage and what are the cool chunks that can be demoted, the system has to collect access frequency metadata.
What does this mean for a poor storage admin? Well, first of all, it explains why disk arrays need bigger processors to deliver good performance. Your vendor may not be blowing smoke when he says you can't run tiering or compression on your old system because it doesn't have the horsepower. It also means vendors that designed their systems to use chunks for data protection mapping five data chunks and one parity chunk on some drives rather than assigning drives to RAID sets like Compellent, 3Par and to some extent HP's EVA, have the inside track over conventional designs that need abstraction layers added to do thin, etc.
But the biggest impact is that on a wide stripped, deduped, thin provisioned, auto-tiered array there is no such thing as sequential I/O. Your data isn't on the same track of the drives in the RAID set, it's where ever the array controller decided was the best place for it. Cache, flash and smart software allow the system to deliver data as if it were sequential but old tools like defragmenting a Windows volume aren't doing any good any more.
Welcome to the 21st century. Please keep your hands in the car at all times. "The way we've always done it" isn't good enough any more.
About the Author
You May Also Like