Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Deduplicating Replication - Quantum

Quantum's deduplication method is called adaptive inline deduplication. This means that they will automatically adjust between inline deduplication and post-process deduplication as needed. Basically, if the system thinks it is getting too throttled down by the deduplication work, it will shift into a post process mode for as long as needed. If needed, the post-process deduplication can be forced by enabling a backup window in order to defer the deduplication process - it's called the deferred mode. Depending on the mode that unit is performing its deduplication, this will affect how it replicates the data.

Quantum's deduplication is leveraging disk. When the backup application sends a backup job it will be chopped into multiple chunks and pieces stored on disk instead of memory while the deduplication process executes. They have the ability to store most data in native or unduped format. Assuming you have the space, this could help in recoveries of complete systems by circumventing the need to re-inflate or undeduplicate data as it is being recovered.

As it relates to replication, with the unit in adaptive inline mode, you do not have to wait until the whole job is done to start the replication process. As soon as the first chunk of data lands on disk, it gets deduplicated, so the unique variable size block data are ready to be replicated to the secondary site.  

If the unit is in a post-process mode, you have to wait until the entire job is complete. Since with post-process there may be a delay in getting the replication process started, as there is with other post-process deduplication products, you may want to alter the size of your backup jobs so that you have more smaller jobs to evenly distribute the load if possible.

The replication process itself fits my definition of global deduplication, which is that if three sites are sending data to the disaster recovery site, only data that is unique across the three sites will be sent to the DR site. For example, each site will send a list of blocks that need to be replicated to the DR site. If the DR site realizes that it has already seen that block, regardless of source, it will tell that site not to send those blocks. This process helps further thin the replication bandwidth requirements in many-to-one replication strategies.

  • 1