Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

'Only The Blocks That Have Changed' And Other Platitudes

Many of the technologies we've come to rely on in the storage world nowadays use one form or another of changed block tracking. Snapshots, replication (especially the point-in-time kind), automated tiering and data deduplication all work by identifying changed, or different, blocks and treating them in some special way. The problem is that while parts may be parts, blocks are most definitely not blocks.

Part of the problem is that when storage guys hear the term "block" they immediately think of 512-byte SCSI blocks and think that copying, moving and storing only the blocks that have changed is an efficient process. Unfortunately for us, when storage systems replicate data or take snapshots, the blocks they move around are more like file system allocation units than SCSI blocks, and are usually a lot bigger than 512 bytes. As a result, users frequently see that they need more snapshot space and WAN bandwidth than they thought they did in order to use the cool features of their storage systems.

The problem is due in some part to the fact that storage folks use the term block for everything, in much the way network guys informally use "packet" even though they have more exact terms like frame, datagram and segment. While some of us will use chunk to represent these larger units of data, just about every presentation I see includes the magic phrase "only the blocks that have changed".

Exactly how large the chunks a storage system uses for its allocation unit varies widely and can have a significant impact on how efficiently the chunk-based function you're looking at will run. If you're running a SQL Server database application that does a lot of random database updates, as many do, each record you update in the database will write an 8KByte updated SQL Server page to the disk.  

If your storage system uses 4KByte chunks like NetApp's WAFL each 8KByte SQL Server page update will cause the system to store two chunks. If your system uses 16MByte chunks, as some do, then one 8KByte database update will take up 16MBytes of snapshot space, consume 16MBytes of WAN bandwidth to replicate and use 16MBytes of expensive flash memory when migrated to tier 0.

  • 1