Analysis: Data De-Duping
March 16, 2007
Just a few years ago, disk-to-disk backup seemed almost too good to be true. Powered by inexpensive ATA (and later SATA) disk drives, D2D, whether implemented as virtual tape libraries or as a backup-to-disk option in your favorite backup application, made backups faster, eliminated mechanical failures in tape drives and libraries, and made it easier to deal with the continuous chorus of calls to the helpdesk for individual file restores.
Today, our disk-backup devices are filling up, and there's not enough space or power in the data center to add another petabyte of backup space, so we're keeping only two to three days' worth of backups on disk, when we'd like to keep a month's worth. Problem is, there's too much duplicate data in our backup sets. The good news is, vendors--smelling money, of course--are promising that their new data de-duplication products can provide 20-to-1, even 300-to-1 reductions in the amount of data we need to store. Can it be? Let's take a look.
De-duplication technology lets you store more backup data on a given set of disks. This can extend the period you keep disk backups and reduce your data center power and cooling costs. If you de-dupe data before sending it across the WAN, you can save on bandwidth, making online off-site backups practical at companies that used to rely on tape. The only drawback to data de-duplication is that it can slow down the backup process.