Upcoming Events

Where the Cloud Touches Down: Simplifying Data Center Infrastructure Management

Thursday, July 25, 2013
10:00 AM PT/1:00 PM ET

In most data centers, DCIM rests on a shaky foundation of manual record keeping and scattered documentation. OpManager replaces data center documentation with a single repository for data, QRCodes for asset tracking, accurate 3D mapping of asset locations, and a configuration management database (CMDB). In this webcast, sponsored by ManageEngine, you will see how a real-world datacenter mapping stored in racktables gets imported into OpManager, which then provides a 3D visualization of where assets actually are. You'll also see how the QR Code generator helps you make the link between real assets and the monitoring world, and how the layered CMDB provides a single point of view for all your configuration data.

Register Now!

A Network Computing Webinar:
SDN First Steps

Thursday, August 8, 2013
11:00 AM PT / 2:00 PM ET

This webinar will help attendees understand the overall concept of SDN and its benefits, describe the different conceptual approaches to SDN, and examine the various technologies, both proprietary and open source, that are emerging. It will also help users decide whether SDN makes sense in their environment, and outline the first steps IT can take for testing SDN technologies.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

The Reality Of Primary Storage Deduplication

Should you be deduplicating your primary storage? For storage-obsessed IT, primary deduplication technologies is too sweet to ignore when you can eliminate duplicate data in your high-priced, tier one storage and cut storage costs by 20:1. Deduplication for back-up and off-line storage is a natural fit. Still, primary storage access demands means the reality of primary storage deduplication is a lot less rosy than you might expect.

With deduplication technologies, the deduplication software breaks files down into blocks and then inspects those blocks for duplicate patterns in the data. Once found, the deduplication software replaces copies of the pattern with pointers in its file system to the initial instance. Deduplication originally began in backup storage, but with IT's storage worries, it was only a matter of time before the technology was applied to primary storage. Today, EMC, Ocarina, Nexenta, GeenBytes and HiFn, to name a few, are all bringing deduplication to primary storage.

More specifically, there are three factors that George Crump and Howard Marks, Network Computing Contributors, point out driving this phenomenon. For one, storage is growing too fast for IT staffs to manage. Extra copies of data are going to occur, such as several copies of data dumps, multiple versions of files and duplicate images files. Primary storage deduplication catches these instances. The second play for primary storage deduplication is in storage of virtualized server and desktop images. The redundancy between these image files is very high. Primary storage deduplication will eliminate this redundancy as well, potentially saving terabytes of capacity. In many cases, the read back from deduplicated data offers little or no performance impact.

The third and potentially the biggest payoff is that deduplicating primary storage will effect optimization. Copies of data, backups, snapshots and even replication jobs should all require less capacity. This does not remove the need for a secondary backup; every so often it seems like it will be a good idea to have a stand-alone copy of data not tied back to any deduplication or snapshot metadata. Being able to deduplicate data earlier in the process does potentially reduce the frequency that a separate device is used, especially if the primary storage system replicates to a similarly enabled system in a DR location.

Deduplicating the primary storage isn't without risks and misconceptions. Deduplication ratios are dependent on the type data being operated on. Backup data, for example is highly repetitive, letting deduplication ratios to run as much as 20:1, but those opportunities don't existing in primary storage where ratios tend to run closer to 2:1. There's also a performance penalty with deduplication that won't be acceptable in certain situations, such as online transaction processing (OLTP) applications. Finally, as Marks points out, deduplication systems aren't all alike and using a primary deduplication system with the wrong backup application could result in significant problems.

Page:  1 | 2  | Next Page »

Related Reading

More Insights

Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
Vendor Comparisons
Network Computing’s Vendor Comparisons provide extensive details on products and services, including downloadable feature matrices. Our categories include:

Data Deduplication Reports

Research and Reports

Network Computing: April 2013

TechWeb Careers