Data Deduplication Update 2 - Scalability

As we continue our tour through the deduplication maze, one of the battle cries of data deduplication suppliers is how well their product scales. Scalability, however, is in the eye of the beholder. In the deduplication system space, this results in the classic battle of the single box storage system vs. a grid or clustered storage system. In the software space it raises questions.

George Crump

October 29, 2009

2 Min Read
Network Computing logo

As we continue our tour through the deduplication maze, one of thebattle cries of data deduplication suppliers is how well their productscales. Scalability, however, is in the eye of the beholder. In thededuplication system space, this results in the classic battle of thesingle box storage system vs. a grid or clustered storage system. Inthe software space it raises questions.

The value of a single box system is that it is simple. You plug it inone time and it works. Problem solved. This compares to the potentialchallenge of a multi-node storage cluster where you have multiple partsto put together. You have to make a decision if the single box, likethose offered by Data Domain and Nexsan, will be fast enoughto meet your backup performance and scalability needs over time. Clusteredsystems, like those offered by Exagrid and Sepaton, require a biggerupfront footprint but have the potential to scale both performance andcapacity over the long term. Single unit systems have kept pace with user demands by riding thetechnology waves of faster processors and higher capacity hard drives.From a raw storage I/O standpoint, most of these systems can keep pacewith the multi-node cluster offerings, especially for the typical datacenter. They may also be more power efficient than multi-node clustersand more readily be able to implement MAID like functions.

Scaling capacity requires the addition of a second system that has tobe managed separately. You as the customer have to decide where thebreaking point is for your environment. Managing two data deduplicationsystems is not a challenge for most, but managing ten might be a problem.In the future, I expect single system deduplication systems to manage multiple systems in the background, presenting avirtual IP address to the backup server. This essentially creates aloosely coupled cluster.  

Deduplication systems typically reach capacity because of a desire tokeep backup data for a long time, potentially eliminating tape. Anotheroption is to use a Recovery Service Provider like Simply Continuous, or even a straight archive system like those offered byPermabit, Caringo, Nexsan and others. By shifting the longer termretention of data to a dedicated archive or a provider, the local boxdoes not need to scale. Management of different retention times on multipleboxes is available from several of these vendors and those that supportSymantec's OST have even greater flexibility and control.

If managing multiple single box deduplicationsystems and outsourcing the storage of the older backups is a concernfor you, this is where clustered or grid systems come into play.Something we will delve into in our next entry "Scaling BackupDeduplication with Clustered Storage".

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights