Experts Share De-Dupe Insights

Experts discuss the potential pitfalls of deploying de-dupe technology

July 12, 2007

4 Min Read
Network Computing logo

Data-deduplication may be grabbing the headlines, but analysts and users are urging a cautious approach to the much-hyped technology.

De-dupe, which aims to reduce the bulk of backed-up data by ensuring that the same information is not stored in two places, has prompted a product blitz from vendors in recent months. (See Quantum Intros GoVault, Top Storage Predictions for 2007, Insider: De-Dupe Demystified, ExaGrid Extends Backup Support, and EMC Talks Disk & De-Dupe.) At the same time, some users have deployed the technology as a way to control exploding data. (See Timecruiser and Users Look Ahead to 2007.)

Other potential adopters are confronted by a bewildering array of technologies and approaches to data reduction. Experts have a series of suggestions for navigating the maze:

  • Check the alternatives. Despite a blaze of publicity surrounding de-dupe, it does not fit all needs. "De-dupe is not the only solution for reducing data," says StorageIO Group analyst Greg Schulz, adding that users should consider other technologies for streamlining their data footprint.

    In particular, the analyst identifies data compression technology from the likes of Storewiz as a valid alternative. (See Storewiz Notches $9M, Storewiz Bolsters Compression, Storewiz Notches $9M, and Stealthy Ocarina to Add Compression.) "Compression and compaction might give you a lower data reduction ratio than de-dupe, but it will give you that ratio consistently," he says, explaining that users can typically get a more reliable level of performance from compression.

    And Storewiz also reduces the bulk of primary storage, not just backup. Still, it touts compression ratios between 2:1 and 5:1, depending on the type of data and application, compared to ratios anywhere between 20:1 and 50:1 for de-duplication. (See Storewiz Bolsters Compression, Storewiz Intros Product Line, Sepaton Adds De-Dupe to VTL, and Analysis: Data De-Duping.)

  • Assess your needs. Before considering de-duplication, users should think seriously about what they are looking to achieve. Will de-duplication be used for backup, for archiving, or both? It may also help to have some uniformity in the data itself before considering de-dupe, according to Schulz. "The key to de-dupe is having the same or similar data," he says, explaining that data with a lot of duplicate information, such as names and addresses, is the easiest to shrink.

  • Choose a vehicle. Vendors are currently touting a diverse set of de-duplication technologies, using both hardware and software. Data Domain and Quantum, for example, use specialized de-dupe appliances, whereas Asigra and Avamar, now part of EMC, offer a software-based approach. (See Quantum to Offer De-Dupe Duo, Asigra Protects Consultancy, and EMC Picks Up Avamar.) A third group of vendors, which includes FalconStor and Sepaton, use a VTL for de-duplicating data. (See FalconStor Picked by Publisher, FalconStor Goes Nordic, and FalconStor Extends VTL.)"My preference is to have it built into the software so that I can use whatever software I want," says Marc Staimer, president of analyst firm Dragon Slayer Consulting, who believes that VTL and appliance-based de-duplication can be restrictive. "It limits my flexibility, it means that if I want to go with someone else's VTL, it will be more difficult."

  • Inline versus post-processing. There are currently a couple of key approaches to de-duplication: inline processing, which is offered by the likes of Data Domain and Diligent, among others; and post-processing, which is offered by Sepaton and FalconStor. (See Data Domain Unveils DD580, Diligent Breaks Record, and Sepaton, Hifn Partner.) At this stage, only one vendor, Quantum, offers both approaches on a single appliance. (See Quantum Intros DXi7500.)

    Inline processing takes place as data is being received from the backup servers and before it is stored to disk, skipping a final step. Post-processing, as its name suggests, occurs after the backup.

    In the current Byte and Switch de-duplication poll, IT managers were split evenly between inline and post-processing, although a further third said they would like to deploy a mixture of the two technologies.

    Although post-processing is less likely to slow down backups, it is seen as a better fit for enterprises with extra disk capacity to store the data until it is de-duped.

  • Consider the security implications. De-duplication, with its emphasis on single instance storage, brings with it a whole new set of security considerations, according to James Wang, CTO of Fairfield, N.J.-based education service provider Timecruiser, which has deployed de-dupe technology from FalconStor. "When you're de-duping, your losing the amount of copies that you have," he says. "You only have a single copy, so you have to guard it very carefully."

    In the past Timecruiser performed full copies of all of its data on a daily basis, whereas now, only a portion of the data is copied. In an attempt to add another layer of security into this process, Wang and his team are looking to add remote replication to their de-dupe infrastructure, possibly replicating data to an offsite device.

James Rogers, Senior Editor Byte and Switch

  • Asigra Inc.

  • Avamar Technologies Inc.

  • Data Domain Inc. (Nasdaq: DDUP)

  • Diligent Technologies Corp.

  • EMC Corp. (NYSE: EMC)

  • ExaGrid Systems Inc.

  • FalconStor Software Inc. (Nasdaq: FALC)

  • Osterman Research

  • Quantum Corp. (NYSE: QTM)

  • Sepaton Inc.

  • The StorageIO Group

  • Storewize Inc.

Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like

More Insights