Now That We Can Dedupe Everywhere; Where to Dedupe?

Data deduplication first appeared on specialized appliances designed to be used as the target of an existing backup application like NetBackup or Networker. My friend W. Curtis Preston recently posted a chart comparing the performance of the current generation of these appliances to his Mr. Backup Blog. While this helps answer some questions you may have about deduplicating appliances it begs another. In an era where I can dedupe data at just about any stage in the backup process where is the be

Howard Marks

November 10, 2010

3 Min Read
NetworkComputing logo in a gray background | NetworkComputing

Data deduplication first appeared on specialized appliances designed to be used as the target of an existing backup application like NetBackup or Networker. My friend W. Curtis Preston recently posted a chart comparing the performance of the current generation of these appliances to his Mr. Backup Blog. While this helps answer some questions you may have about deduplicating appliances it begs another. In an era where I can dedupe data at just about any stage in the backup process where is the best place for me to backup my data? Today we just don't have answers to these questions. Maybe someday.

For large data centers with many terabytes of data to backup every night, appliances are the way to go.  If you're generating enough backup traffic to keep a high end appliance like Data Domain's DD800 or Quantum's DXi 8500 that can ingest data at 1.5GB/s or better fed, you're likely doing it through multiple media servers if not multiple backup applications.  

In those environments, using a single large appliance, or an array like NEC's HydraStor that can accept data at a mind boggling 27GB/s (97TB/hr), lets you have a single device to manage holding all your fresh backup data. It also means all your data is in a single globally deduplicated storage pool so the Windows C: drives you backup from physical servers with NetBackup and the virtual ones backed up with Veeam Backup will, if the stars align just right, result in WINSOCK.DLL being stored just once.

I'm less sure about the smaller outfits with 2-50TB of data to protect.  Should that outfit buy a Quantum DXi6510 or Data Domain DD610 which will give them 6-TB of net disk space (After RAID but before deduplication) for $50,000 or use the deduplication feature of their backup software and relatively low end disk array from Overland Storage, Promise, Infortrend or the like.  

Depending on what backup software they use the deduplication option will add $2-20,000 to their costs and a low end array with 12 1TB drives another $8-12K.  Is the mid-range appliance worth twice the price?Of course the answer depends on three things.  First, how well they, or their reseller, can integrate the software and array.  A system that just works is easily worth 4-5 times what a cranky combination of components integrated by a Geek Squad reject is.

Next comes the replication question. If this is a branch office that's going to be replicating data to HQ use the same brand appliances as headquarters.  This is not the place to be a hero and show the home office how much money they waste.  Replicating from one Data Domain or Quantum box to another is easy, and bandwidth efficient. The alternative is shipping tapes which are sure to get lost because the guys in the home office you're shipping them to don't want them anyway.

The open question is how fast is media server deduplication?  Appliance vendors tout their performance but I haven't seen the backup software vendors talking about it.  How fast can Backup Exec or Simpana backup dedupe data?  What kind of disk do you need to get reasonable performance?  Do you need SSDs to hold the hash index the way some appliances do?

Getting those answers are hard to come by, but you have to ask your prospective vendors to pony up the data to support their performance claims and compare that costs of adding deduplication to your storage fabric.

Disclaimer: Overland Storage and Symantec are clients of DeepStorage.net. Neither has commissioned backup benchmarking.

About the Author

Howard Marks

Network Computing Blogger

Howard Marks</strong>&nbsp;is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.</p><p>He has been a frequent contributor to <em>Network Computing</em>&nbsp;and&nbsp;<em>InformationWeek</em>&nbsp;since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of&nbsp;<em>Networking Windows</em>&nbsp;and co-author of&nbsp;<em>Windows NT Unleashed</em>&nbsp;(Sams).</p><p>He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders.&nbsp; You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights