Snapshots And Backups Part Deux
February 17, 2011
In What Is A Backup?, I compared conventional backups to local snapshots, concluding that restoring data faster using backups is easier when you know where it was last. With conventional backups, an administrator, after cursing under his breath and wishing he could just say no to the CFO, could search the catalog database for *smith*.xls in Finance and locate the file. Since local snapshots don't include catalogs, it's harder to restore the data that disappeared sometime last summer. But there are more issues around using local snapshots for backup.
Part of the reason many storage administrators cringe at the thought
of snapshots as a backup medium is that they still view the boxes of old backup
tapes at Iron Mountain as a long-term data retention solution. The limited number of snapshots a storage
system can maintain means they can't satisfy the long-term retention function
that many backup admins continue to use their backup systems for.
Let's look at the solutions from a couple of the vendors that emphasize snapshots as key to their solutions and responded to Hollis' When Is A Backup Really A Backup? NetApp systems can keep 255 snapshots of any given volume. Since NetApp stores snapshot data in the same RAID set as the primary data (which
means on the same class of disks), keeping 255 snapshots will be expensive. Nimble
Storage pitches its system as consolidating backup and primary data holding
30 to 60 days of backup data compressed on SATA drives.
Frankly, if you have a real archiving system, 60 days of
backup data should be plenty. The truth is, you rarely restore data older than
60 days. You may go on a fishing expedition
looking for data someone needs now that he or she deleted a
year ago, but archives, with their full-text indexes and deep metadata catalogs,
are much better places to fish for data than dusty old backup tapes.
My real problem with using local snapshots as backups is the
local part. Snapshots stored in the same system as the primary data are
dependent on the primary data. If the
storage system fails, you lose not just the primary data but the backups, as
I was going to write about how I thought the comments by EMC's Chuck Hollis that array failures were rare occurrences that users could
essentially ignore were foolish at best and irresponsible at worst. I was going
to look up all sorts of statistics about the likelihood of dual drive failures
in RAID 5 systems and really geek it up.
Then I got an e-mail from a client that last Thursday suffered
a dual disk failure on their primary disk array. I'm spending the next few
days helping them with the aftermath. Once you have to clean up after something like that, you don't worry
about statistics anymore. It happens, it's
happened to me, and it's going to happen to you. So local snapshots are not
To make snapshots a sufficient backup system, you need to
replicate the data as well as take snapshots. When combining replication and snapshots, I see three places where things could
go wrong. First, you have to both
replicate to an independent system in the same data center, or at least on
campus, so you can recover quickly from an array failure without activating
your whole disaster recovery plan. Then
you have to replicate to a remote site so you're covered in case of bigger
problems like fires, floods and power failures.
Finally, you have to make sure all three sets of snapshots
are application consistent. It's easy to
have Windows Volume Shadow Copy services or scripts quiesce your database
for the local snapshot, but you have to take care that the replication system in
your storage arrays maintains that snapshot timing. Often the easiest way is to use point-in-time
replication that sends the snapshot data from array to array rather than
replicating in real time and creating snapshots on the target arrays.
Once you get to three copies, snapshots can be a reasonable backup plan. However, with three copies, snapshots can cost as much as more conventional backups.