Howard Marks

Network Computing Blogger

Upcoming Events

Where the Cloud Touches Down: Simplifying Data Center Infrastructure Management

Thursday, July 25, 2013
10:00 AM PT/1:00 PM ET

In most data centers, DCIM rests on a shaky foundation of manual record keeping and scattered documentation. OpManager replaces data center documentation with a single repository for data, QRCodes for asset tracking, accurate 3D mapping of asset locations, and a configuration management database (CMDB). In this webcast, sponsored by ManageEngine, you will see how a real-world datacenter mapping stored in racktables gets imported into OpManager, which then provides a 3D visualization of where assets actually are. You'll also see how the QR Code generator helps you make the link between real assets and the monitoring world, and how the layered CMDB provides a single point of view for all your configuration data.

Register Now!

A Network Computing Webinar:
SDN First Steps

Thursday, August 8, 2013
11:00 AM PT / 2:00 PM ET

This webinar will help attendees understand the overall concept of SDN and its benefits, describe the different conceptual approaches to SDN, and examine the various technologies, both proprietary and open source, that are emerging. It will also help users decide whether SDN makes sense in their environment, and outline the first steps IT can take for testing SDN technologies.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

See more from this blogger

When Hashes Collide

If there was any doubt in my mind that data deduplication is a mainstream technology, it was wiped out when I saw--in the business section of The New York Times last week--a full-page ad from Symantec touting its deduplication technology. Even so, I still occasionally run into people who consider deduplication to be a dangerous form of black magic that is likely to mangle their data and end their careers. This attitude represents an overestimation of the likelihood of a hash collision in deduplication and of the reliability of more traditional backup media.

First, let's look at the reliability of the other components in your storage system. Today's hard drives are rated to fail to read a sector once every 10^14 to 10^16 bits (100 to 10,000TB). As a backup to detect read errors and allow the array controller to rebuild the data from an Error Checking and Correction (ECC) stripe, enterprise drives add a 16 bit CRC (Cyclical Redundancy Check) in the T10 Data Integrity Field (DIF) that will itself fail to detect one in 64K (65536) errors. As your data travels across an Ethernet or Fibre Channel network, it is error-checked using a 32-bit CRC (Cyclical Redundancy Check), which will return the right value for the wrong data 1 in 10^9 times.

Finally, if you're avoiding deduplication because you don't trust it, you write the data to an LTO-5 tape, which has an error rate of one in 10^17. Well, one in 10^17 sounds great! I mean, the odds of winning the Powerball lottery are two in 10^8. LTO-5 error rates are a billion times better than that! Of course, the spec sheet also says that that's for non-media errors, so errors caused by tape mishandling, overuse and the like aren't included or calculable.

So how do those reliability levels compare to a typical deduplicating backup target? Among hash-based deduplicating systems, SHA-1 is the most commonly used hash function. With a 20-byte hash value, the odds of any two blocks generating the same hash from different data are about one in 10^48, which anyone will admit is a really big number. Of course, what we're worried about is the odds of two blocks in our data center generating a hash collision, and that depends on the amount of data in the deduplication universe.

As my friend W. Curtis Preston says, it's more likely that, on any given day, Jessica Alba will come running to me to be mine forever than that two blocks in my data will wrongly generate the same hash. The former is possible, after all. Ms. Alba and I are both alive, but, given the fact that I'm an old, fat, geeky guy in New Jersey and she's, well, Jessica Alba, it's highly improbable.  

Page:  1 | 2  | Next Page »

Related Reading

More Insights

Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
Vendor Comparisons
Network Computing’s Vendor Comparisons provide extensive details on products and services, including downloadable feature matrices. Our categories include:

Data Deduplication Reports

Research and Reports

Network Computing: April 2013

TechWeb Careers