Howard Marks

Network Computing Blogger

Upcoming Events

Where the Cloud Touches Down: Simplifying Data Center Infrastructure Management

Thursday, July 25, 2013
10:00 AM PT/1:00 PM ET

In most data centers, DCIM rests on a shaky foundation of manual record keeping and scattered documentation. OpManager replaces data center documentation with a single repository for data, QRCodes for asset tracking, accurate 3D mapping of asset locations, and a configuration management database (CMDB). In this webcast, sponsored by ManageEngine, you will see how a real-world datacenter mapping stored in racktables gets imported into OpManager, which then provides a 3D visualization of where assets actually are. You'll also see how the QR Code generator helps you make the link between real assets and the monitoring world, and how the layered CMDB provides a single point of view for all your configuration data.

Register Now!

A Network Computing Webinar:
SDN First Steps

Thursday, August 8, 2013
11:00 AM PT / 2:00 PM ET

This webinar will help attendees understand the overall concept of SDN and its benefits, describe the different conceptual approaches to SDN, and examine the various technologies, both proprietary and open source, that are emerging. It will also help users decide whether SDN makes sense in their environment, and outline the first steps IT can take for testing SDN technologies.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

See more from this blogger

Primary Deduplication Not Good For Backup

As an industry, we have fallen into the trap of thinking of data deduplication as a single technology.  When NetApp and EMC were in their bidding war for Datadomain, some analysts were wondering why EMC, which had deduplication technology in Avamar, would want Datadomain's technology. Now that data deduplication is gaining traction in the primary storage market, I thought I would point out that a deduplication system designed for primary data may not be as effective with back-up data.

All deduplication systems work by breaking down the files, or other objects like virtual tapes, into smaller blocks. They then identify those blocks that contain the same data, like the corporate logo on every PowerPoint slide, and use links in their internal file system so the single block of data they store can stand in for all the other copies of that data across the file system. Breaking the data down into blocks is easy, the hard part is figuring out what block alignment will result in the best data reduction. The simplest systems, like NetApp's or ZFS deduplication, simply break each file into fixed-size blocks. This works reasonably well for primary storage file systems that hold a large number of small files as each file starts on a block boundary. It works especially well for applications like VDI hosting where there are a lot of duplicate files.

Since the vast majority of today's backup applications create a small number of what are essentially tarballs or .ZIP files when they backup to disk, deduplicating backup targets have to work harder to determine where the block boundaries are. Content-aware systems like Sepaton's and Exagrid's reverse engineer the backup application's file and/or tape formats so they can identify each source-file in the stream and compare it to other copies of that file they've already stored. Other vendors have their own secret sauce, and while Datadomain's hash-based, variable block-size approach made sense when Hugo Patterson their CTO explained it to me last week, it's a bit too complicated to describe here.

Now imagine using a simple fixed-block deduping system with a backup stream. Your back-up app backs up the C: (system) and E: (data) drives of your server in a single back-up job to a single virtual tape file. The system logs are backed up early in the process, which causes all the data to be offset 513 bytes from where it was in yesterday's backup. While there may still be some duplicate blocks there won't be nearly as many as if the system could reset the alignment.

The moral of the story is all deduplication schemes are not alike.  Use primary storage deduplication with the wrong backup app, and you may not see the 20:1 data reduction you're looking for.  You'll see some data reduction, but we'll have to try them in the lab to see how much. Disclosure: I am currently working on projects for NetApp and EMC/Datadomain.

Related Reading

More Insights

Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
Vendor Comparisons
Network Computing’s Vendor Comparisons provide extensive details on products and services, including downloadable feature matrices. Our categories include:

Data Deduplication Reports

Research and Reports

Network Computing: April 2013

TechWeb Careers