• 05/10/2011
    4:00 PM
  • Rating: 
    0 votes
    Vote up!
    Vote down!

Hadoop And Enterprise Storage?

Both NetApp and EMC have announced that they're turning the turrets of their marketing battleships toward the Apache Hadoop marketplace that provides the back end to many Web 2.0 implementations. While I understand how Hadoop is attractive to these storage vendors--after all, a typical Hadoop cluster will have hundreds of gigabytes of data--I'm not sure I buy that Hadoop users need enterprise-class storage.

Enterprise storage, on the other hand, is based more on a "failure is not an option" model than on a fault-tolerant model. Controllers, drives and even drive enclosures are designed to have long mean times between failures. This reliability, of course, costs money, so you pay more per gigabyte for a Vplex (or even a Clariion) than Google does for its MicroATX motherboards with SATA drives all but duct taped to them.

To understand the very different enterprise and Web 2.0 models, think for a moment of an engineering school egg drop contest. The rules of the contest state that teams must get a dozen eggs unbroken from the roof of the engineering building to a team cooking omelets on the quad. Teams will be judged on cost, speed and originality.

The enterprise team builds a dumbwaiter to gently lower the eggs in a supermarket package down to the quad. The Hadoop team buys three dozen eggs and a roll of bubble wrap, wraps each egg in the bubble wrap, and throws the eggs off the roof. As long as one-third of the Hadoop team's eggs arrive unbroken, it has solved the problem and spent $30 to 40 (compared with the hundreds of dollars the enterprise team needed for dumbwaiter parts).

I can just see some application group deciding that Hadoop will help them process the deluge of data in their data center. The proposal finally comes to the storage group, which looks at the low cost and--to the storage guy's eye--low-reliability storage in the proposal. They say, "This should go on our SAN so we can provide the five-nines reliability enterprise applications require." The project goes ahead with storage on the Symetrix, and, while it works fine, the organization doesn't see the cost savings it expected because they're spending several times as much for storage as they needed.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.

Log in or Register to post comments