Data Domain is the next up in our vendor re-interview blog series. As I have said before, the deduplication method used will have impact on how replication will work in any environment. Data Domain is an inline only system. That means as data comes in, it is deduplicated on the fly. As a result, the replication process should be able to start sooner. Data Domain has been delivering replication capabilities for a while, and their product has some advanced capabilities.
The inline nature of Data Domain's systems means there is no post-process job that needs to execute prior to replication. Depending on the protocol used, almost as soon as each individual backup job completes, it is ready to replicate. The performance advantage that Data Domain may have here versus some of the competition is dependent on how long the post deduplication process takes, and how important it is that you have a DR copy of data landed in the remote location within a given window of time. Another important factor is if the inline system can keep pace with your backup job. If the inline system forces the backup process itself to slow down, that will affect when the replication process can start. As you look at inline systems, it's important to make sure that they can keep pace with the speed at which your backup process can deliver data.
Data Domain provides better support of Symantec's Open Storage (OST). With EMC's recent acquisition of Data Domain, it's safe to assume that we should see some similar type of integration with EMC's Networker and possibly Avamar. An example of how integration with OST helps replication is that all of the replication jobs can be scheduled and managed through the NetBackup interface. Then there is the optimized deduplication capability, which lets NetBackup duplicate a backup image to a secondary appliance without the need for data to be routed through a media server. Once the duplication process is complete, the device notifies NetBackup through the OST API that the duplication is complete, and the NetBackup catalog is automatically updated. NetBackup can now use the secondary copy as easily as the primary copy.
Beyond integration with backup applications, Data Domain's replication software also provides for a cascaded replication. A primary site can replicate to a secondary site and then a tertiary site. Additionally, there is support for high speed replication. Often the focus of DR replication is how small of a WAN connection can be used. Sometimes, however, you have a high bandwidth connection between two sites. Ironically, some systems' replication functions can't take advantage of the additional bandwidth. Data Domain's software has a high performance mode that allows for a multi-streaming transfer across high bandwidth connections for faster DR completion times. Finally, the systems themselves now support cross-site deduplication with a 180:1 fan in ratio. As we described in our first entry, this is the ability to optimize WAN bandwidth by not requiring sites to send data that has already been sent once by a different site.
2/23/2010 It has come to our attention that at the time this story was posted, George Crump was doing business with EMC. As he was doing a series of interviews with all the storage deduplication vendors I didn't feel this was a conflict nor upon review of this article and discussions with Crump, do I think this article is biased. In the spirit of full disclosure, I have added this note. In addition, George Crump in his role at Storage Switzerland has on going business relationships with various vendors in the storage and deduplication marketplace. Any failure to disclose is my mistake, not Crumps.