All Snapshots Are Not Created Equal
September 30, 2010
Some days it seems that array vendors look at their snapshot facilities as much as a way to sell more capacity as to help their customers protect their data. The most egregious example of this was a major vendor's "unified" storage system that a client of mine was thinking about buying a few years ago. On this system the first snapshot of an iSCSI LUN was actually a copy of the LUN's data. So storing 1TB of data and a reasonable number of snaps would take 2.5-3TB of disk. A more efficient system could need just 1.5TB.
Requiring that snap space be allocated, or even reserved, on a per volume basis is another pet peeve of mine. Like fat provisioning, snapshot reservations lead to inefficient disk usage by delegating disk space to functions based on anticipated peak usage, not real demand. I want snapshot providers that take space from the free pool, but let me set a limit on how much space will be taken by snaps.
Last, but certainly not least, is how big a block does the snapshot technology use? Storage guys hear block and think 512 bytes, but no storage system that I'm aware of uses chunks that small to allocate space or manage snapshots. NetApp's WAFL manages data in 4K paragraphs, while I've seen other systems use page or even chapter size chunks of 32K to 512K bytes. Host a database with 4K or 8K pages on a system with 512KB chapters, and your snapshots will take up several times the amount of space they would on one using 64K pages.
Allocation size becomes an even bigger issue if you're using a system that bases its replication on snapshots, as many lower-end systems from Dell/Equallogic to Overland do. It's a rude awakening to discover that you need a bigger WAN link to your DR site because 4K updates generate 256K replications.
If you're just using snapshots to store a consistent image of your data while a backup job completes, you don't have to worry too much about how your storage system creates and manages snapshots. If you want to be able to spawn test copies of your production servers, replicate your snapshots or keep multiple snapshots for days as low RTO (recovery point objective) restore points, the difference between ordinary snapshots and great ones will be significant.