Detailing Deduplication's Replication Mode: ExaGrid
My next re-interview in our ongoing series on dudupe is with ExaGrid. This is a target appliance that looks to your backup software as a network mount point. As the name implies, ExaGrid leverages a grid or clustered storage foundation to provide scale. The advantage is that as data grows, the performance or capacity of the backup is not gated by a single controller or system.
December 8, 2009
My next re-interview in our ongoing discussion on dudupe is with ExaGrid. This is a target appliance that looks to your backup software as a network mount point. As the name implies, ExaGrid leverages a grid or clustered storage foundation to provide scale. The advantage is that as data grows, the performance or capacity of the backup is not gated by a single controller or system. With their latest release ExaGrid claims to be able to support 100TB in a single grid and up to 1.8TB per hour per node (EX10000E nodes). ExaGrid is a post process system. In fact it keeps the latest copy of data in a compressed but non-deduplicated form to help with restore performance. By utilizing deduplication only byte level changes of previous versions of the backup files are kept and they achieve approximately the same deduplication rates as other suppliers.
As it relates to replication, similar to FalconStor which was discussed in our last entry, as each individual backup job completes, ExaGrid performs the replication process at the same time they deduplicate. Data lands on the device, and when a job completes the deduplicationprocess kicks off; unique data is identified and is stored in thededuplicated storage area as it is replicated to the remote site. ExaGrid uses a forward referencing technique to make sure the newest backup is at the front, if you will, of the restore chain, lowering performance impact in a recovery situation. Since ExaGrid is a loosely coupled storage grid, each backup job is stored complete on a discrete node. This allows for each node in the grid to perform deduplication and replication on its own data. As you add capacity via nodes the deduplication and replication, performance of the grid scales with it, but you are gated by the performance per backup job of the individual node. A single job can not aggregate performance across multiple nodes. For the mid-sized data center, ExaGrid's target market, this should not be a big issue.
ExaGrid does support a hub and spoke replication model, but not to the level that FalconStor, Data Domain and others do, so this is not the product to replicate from 140 branch offices to a single data center. It is designed for data center to data center replication and they have customers doing an 8 to 1 replication. Again given ExaGrid's target market, medium sized business, that should not be a show stopper. They currently do not support Symantec NetBackup's OST, although it is on the roadmap and when they do support it they will allow NetBackup to control both the deduplication and replication processes. Also it is unclear if ExaGrid does WAN optimized dedupe.
With the ExaGrid product and others we will examine, you want to make sure you don't have one long backup job running to a single node. If a single job takes eight hours to complete, you will need to wait until that point to begin the replication process. In most environments this should not present a great challenge. It may cause some re-architecting of backup jobs but nothing that is out of the job expectations for a backup administrator.
In the next entry of our deduplication/replication focus is EMC Data Domain. Several additional vendors have reached out to me to be included in our summary. If I haven't heard from you please make sure you get in touch with me so we can include you in the discussion.
About the Author
You May Also Like