Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Analysis: Data De-Duping: Page 6 of 9

Just because the de-duping is running in the background, don't ignore de-duping performance. If your VTL hasn't finished digesting the weekend's backups by the time you start backing up your servers again on Monday night, you may not be happy with the results. Disk space may not be available or the de-duping process may slow down your backups.

Bandwidth Conservation

Saving disk space on a backup appliance isn't the only application of subfile de-duping technology. A new generation of backup applications, including Asigra's Televaulting, EMC's Avamar Axion and Symantec's NetBackup PureDisk, use hash-based data de-duplication to reduce the bandwidth needed to send backups across a WAN.

First, like any conventional backup application making an incremental backup, these use the usual methods like archive bits, last-modified dates and the file system change journal to ID the files that have changed since the last backup. They then slice, dice and julienne the file into smaller blocks and calculate hashes for each block.

The hashes are then compared with a local cache of the hashes of blocks that have been backed up at the local site. The hashes that don't appear in the local cache and file system metadata are then sent to the central backup server, which compares the data with its hash tables. The backup server sends back a list of the hashes that it hasn't seen before; the server being backed up then sends the data blocks represented by those hashes to the central server for safekeeping.