Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

More On Chunking: Page 2 of 2

Software-based deduplication software--especially applications that deduplicate at the source server like Avamar, PureDisk or Asigra's Cloud Backup--will also use the file start and end to determine their chunk boundaries. These applications first identify files that have changed, like a conventional incremental backup, then start the chunking process on each file.

Using file boundaries can optimize fixed-chunk chunking on backup targets if the deduplication engine in the backup target knows the format of the tape or aggregate Tarball-like files your backup application writes its data in. The dedupe engine can determine the start and end of each file within the Tarball and can realign chunks to those boundaries.  Content awareness also allows backup appliances to see the index marks and catalog data that backup applications insert into the Tarball and keep them from throwing off the chunking.

However, fixed-chunk systems can choke on some data. I know of one Data Domain user that used Exchange backups to test Symantec's PureDisk deduplication. They were retaining 40 backups of their Exchange servers in a given amount of storage on the Data Domains, but were unable to store four backups of the Exchange data in the same amount of storage deduped by PureDisk. Exchange data is a small number of large database files where the files change internally between backups, the worst case for PureDisk's dedupe engine. Now, if you used a fixed-chunk dedupe engine where the chunk was smaller than a database page ...

Disclosure:DeepStorage.net has done work for NetApp, Symantec and EMC, whose products were mentioned in this post.