As we continue our tour through the deduplication maze, one of the battle cries of data deduplication suppliers is how well their product scales. Scalability, however, is in the eye of the beholder. In the deduplication system space, this results in the classic battle of the single box storage system vs. a grid or clustered storage system. In the software space it raises questions.
The value of a single box system is that it is simple. You plug it in one time and it works. Problem solved. This compares to the potential challenge of a multi-node storage cluster where you have multiple parts to put together. You have to make a decision if the single box, like those offered by Data Domain and Nexsan, will be fast enough to meet your backup performance and scalability needs over time. Clustered systems, like those offered by Exagrid and Sepaton, require a bigger upfront footprint but have the potential to scale both performance and capacity over the long term. Single unit systems have kept pace with user demands by riding the technology waves of faster processors and higher capacity hard drives. From a raw storage I/O standpoint, most of these systems can keep pace with the multi-node cluster offerings, especially for the typical data center. They may also be more power efficient than multi-node clusters and more readily be able to implement MAID like functions.
Scaling capacity requires the addition of a second system that has to be managed separately. You as the customer have to decide where the breaking point is for your environment. Managing two data deduplication systems is not a challenge for most, but managing ten might be a problem. In the future, I expect single system deduplication systems to manage multiple systems in the background, presenting a virtual IP address to the backup server. This essentially creates a loosely coupled cluster.
Deduplication systems typically reach capacity because of a desire to keep backup data for a long time, potentially eliminating tape. Another option is to use a Recovery Service Provider like Simply Continuous, or even a straight archive system like those offered by Permabit, Caringo, Nexsan and others. By shifting the longer term retention of data to a dedicated archive or a provider, the local box does not need to scale. Management of different retention times on multiple boxes is available from several of these vendors and those that support Symantec's OST have even greater flexibility and control.
If managing multiple single box deduplication systems and outsourcing the storage of the older backups is a concern for you, this is where clustered or grid systems come into play. Something we will delve into in our next entry "Scaling Backup Deduplication with Clustered Storage".George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for datacenters across the US, he has seen the birth of such technologies as RAID, NAS, ... View Full Bio