Source vs. Target Deduplication: Scale Matters
February 03, 2011
I had a nice conversation with the CEO of a backup software vendor, who shall remain nameless, at last week's Exec Event storage industry schmooze-fest. At the event, the CEO asked why I thought target deduplication appliances like those from Data Domain, Quantum and Sepaton were still around. Why, he asked, doesn't everyone shift to source deduplication since it's so much more elegant?
By running in agents on the hosts, source deduplication leverages the CPU horsepower of all the hosts being backed up to do some of the heavy lifting inherent in data deduplication. This should reduce the CPU horsepower needed in the target system and thus hold down its cost. While all deduplication schemes minimize the disk space your backup data consumes, deduplicating at the source minimizes the network bandwidth required to send the backups from source to target.
Since most branch offices run a single shift--leaving servers idle for a 12-hour backup window--and WAN bandwidth from the branch office to the data center comes dear, source deduplication is a great solution to the ROBO (remote office, branch office) backup problem.
As a result, and because of the generally abysmal state of ROBO backup at the time, early vendor marketing for source deduplication products such as EMC's Avamar and Symantec's PureDisk pitched them as ROBO solutions.
Source dedupe fits well wherever CPU cycles are available during the backup window. If bandwidth is constrained, such as in a virtual server host backing up 10 guests at a time, even better. Since it's just software, the price is usually right. And since vendors have started building source deduplication into the agents for their core enterprise backup solutions, users don't even need to junk Networker, Tivoli Storage Manager or NetBackup to dedupe at the source.