Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Analysis: Data De-Duping: Page 5 of 9

In addition to their de-duping approach, backup targets differ in their physical architectures. Data Domain, ExaGrid and Quantum make monolithic appliances that contain their disk arrays. The Data Domain and Quantum appliances can have NAS or VTL interfaces, while ExaGrid is always a NAS. Diligent and FalconStor sell their products as software, running on an Intel or Opteron server, to create a VTL gateway to external storage.

Although a backup appliance with a VTL interface may seem more sophisticated and could be easier to integrate into an existing tape-based backup environment, using a NAS interface gives your backup application more control over virtual media management. When a backup file reaches the end of its retention period, some backup apps, including Symantec's NetBackup, can delete the file from their disk repository. When a de-duping NAS appliance sees the deletion, it can re-allocate its free space and hash index. Since you don't delete tapes, there's no way to release space on a VTL until the virtual tape is overwritten.

Of course, there is a price to pay for fitting 25 TB of data in a 1-TB bag, and not just in dollars. All the work of slicing your data into chunks and indexing it to remove the duplicates does slow things down more than just a little. A midrange VTL like an Overland REO 9000 can back up data at 300 MBps or better. Diligent has been able to achieve 200-MBps backup rates on its ProtecTier in third-party benchmarks, but that required a quad Opteron server front-ending an array of more than 100 disk drives.

Other vendors address the problem by de-duping the data as a separate process that runs after the backup. On a system running FalconStor's VTL software, data is written from the backup app to a compressed but not de-duped virtual tape file. Then a background process chunks the data, removes the duplicates and creates a virtual virtual tape that is an index of which de-duped data blocks were on the original virtual tape. Once the data from a virtual tape is de-duped, the space it occupied is returned to the available space pool. Sepaton's DeltaStor and ExaGrid also perform their de-duping as a post-backup process.

Although post-processing can boost backup speeds, it has its own costs. A system that does post-process de-duping must have enough disk space to hold a full set of standard backups in addition to its de-duped data. If you're looking to keep to a weekly full/daily incremental backup schedule, you may need a couple times more disk space on a system that de-dupes in the background to hold those full backups until it can digest them.