In my last entry I looked at scaling single system solutions, and in this entry I'll take a look at scaling backup deduplication via a storage cluster approach delivered by companies like Sepaton and Exagrid. The idea here is to make adding capacity and performance as simple as adding another node to the cluster. Each time you add a node, capacity and performance scales with it. This may be ideal for the enterprise and even for rapid growth mid-tier companies.
All clustered storage systems are not created equal. As we discussed in our entry "Storage Clusters - Tightly Coupled vs. Loosely Coupled," the key thing to understand is how these storage clusters deliver on their main promise to still deduplicate backups in an efficient manner. While referencing a single target that scales seamlessly in the background is an improvement, you also may want to make sure the deduplication is applied globally across the cluster. In some cases, the deduplication is only done on a per node basis and as a result somewhat reduces the level of deduplication effectiveness.
Second, some systems require that you point to a specific node in the cluster as opposed to a virtual node or control node. Neither are deal breakers but worth being aware of. My thinking is that if you want a clustered storage system, especially in the enterprise, that will grow with you, then you also want the deduplication and performance to globally improve as you add nodes.
Finally, as anyone who has managed a cluster of any type, there is an implication of added complexity with a cluster. A storage cluster is no different. Storage vendors have reduced the complexity somewhat by pre-packaging the base configurations of the cluster. If you have the time to evaluate solutions, make sure you test adding a node to the cluster. Do it yourself, from the point of opening the box all the way through adding the node to the cluster and rebalancing storage capacity. If you don't have time to evaluate solutions, then you should ask hard questions to make sure you understand exactly how nodes are added and what you have to do to make that happen.
As is the case with primary storage, there is no one right answer for all data centers. As a result there is a never ending supply of options. Single unit deduplication systems seem to benefit from initial simplicity, potentially better energy efficiency and should have a cost advantage. Multi-Node clusters benefit from reduction in forklift upgrades and potentially global deduplication.George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for datacenters across the US, he has seen the birth of such technologies as RAID, NAS, ... View Full Bio