Storage

04:00 PM
Howard Marks
Howard Marks
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Hadoop And Enterprise Storage?

Both NetApp and EMC have announced that they're turning the turrets of their marketing battleships toward the Apache Hadoop marketplace that provides the back end to many Web 2.0 implementations. While I understand how Hadoop is attractive to these storage vendors--after all, a typical Hadoop cluster will have hundreds of gigabytes of data--I'm not sure I buy that Hadoop users need enterprise-class storage.

Both NetApp and EMC have announced that they're turning the turrets of their marketing battleships toward the Apache Hadoop marketplace that provides the back end to many Web 2.0 implementations. While I understand how Hadoop is attractive to these storage vendors--after all, a typical Hadoop cluster will have hundreds of gigabytes of data--I'm not sure I buy that Hadoop users need enterprise-class storage.

EMC's Greenplum division is introducing its own distributions of Hadoop with an all-open-source community edition and a ruggedized enterprise edition. These will be available as software and installed on the Greenplum HD Data Computing Appliance, which uses SATA drives in a JBOD configuration. However, since it's from EMC, it will certainly cost more than using Supermicro servers and Western digital drives from NewEgg.

NetApp is pitching the concept of shared DAS by SAS connecting the Engenio RAID arrays it just bought from LSI (now renamed the E-Series). NetApp is pushing the E2600 low-end array for Hadoop clusters.

The key to these announcements may be in Informatica CEO James Markarian's statement from a stage in the EMCworld pressroom that some companies are more willing to adopt new technologies like Hadoop if they can buy them from trusted suppliers such as EMC.

Personally, I'm not so sure. To get the full benefits of the Web 2.0 architecture, organizations may have to--for those applications where it's appropriate--adopt the whole Web 2.0 toolkit and design model. Hadoop's Hadoop Distributed File System (HDFS) is designed to distribute data across multiple nodes so it can survive node failures without data or even data availability loss. This enables Web 2.0 site operators to use large clusters of very inexpensive nodes with SATA JBODs to store their data and process it at a very low cost per gigabyte.

Howard Marks is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage ... View Full Bio
Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Slideshows
Cartoon
Audio Interviews
Archived Audio Interviews
Jeremy Schulman, founder of Schprockits, a network automation startup operating in stealth mode, joins us to explore whether networking professionals all need to learn programming in order to remain employed.
White Papers
Register for Network Computing Newsletters
Current Issue
Video
Twitter Feed