Hadoop Tames Big Data for the Enterprise

Kajeepeta reports that Hadoop was inspired by MapReduce, a system designed by Google to support distributed computing on large data sets. Hadoop has many commercial flavors and is supported by a large ecosystem of tools and technologies that can help organizations tackle the broad problem of big data analytics. Several large technology companies (including Amazon.com, Facebook, IBM, Twitter and Yahoo!) and end user companies (such as eBay, Zurich, The New York Times and Fox Network ) are effectively using Hadoop to power various big data initiatives, such as enterprise search, social connections, sentiment analysis, log analysis, data mining and even supply-chain reporting.

When front-ended with an appropriate business case, applying the technology of Hadoop to a big data initiative can yield dramatic results for the business. Visa has said, for example, that the process time for 73 billion transactions, amounting to 36 Tbytes of data, was shrunk from one month with traditional methods to a mere 13 minutes with Hadoop.

However, Kajeepeta also highlights the fact that more than just Hadoop is needed to deal with the problems of big data. He suggests that a layered approach incorporating best practices is the best path to leverage big data. "As a set of layers needed to build any data analytics solution, the reference architecture of big data projects does look quite familiar," he said. "Where it differs from the norm are in the layers that account for large volumes of distributed, and potently heterogeneous, data; modeling tools that deal with the rather flat (and evolving) nature of the data relationships involved; specialized scale-out analytic databases and BI suites; and niche big data analytics packages for customer and sales domains."

Hadoop addresses that layered approach by incorporating processes that address the pragmatic needs of big data, including support for parallel and batch processing of large data sets (often many gigabytes to terabytes in size); a fault-tolerant clustered architecture; the ability to move compute power closer to data (rather than the other way around); and the ability to foster an ecosystem of open/portable layers of enterprise architecture from the compute/data layer all the way up to the analytics layer.

Nevertheless, companies will still need to be selective about which Hadoop projects/components are selected, and should proceed with caution. Kajeepeta’s research indicates that a good starter set of Hadoop projects might include HDFS and HBase for data management; MapReduce and Oozie as a processing framework; Pig and Hive as development frameworks for developer productivity; and the open source Pentaho for BI.

Kajeepta warns that the effective implementation and management of Hadoop requires a fair amount of expertise. If such expertise does not exist in-house, enterprises may want to partner with a service provider and/or implement one of the commercial versions of Hadoop. It is also important that companies consider the security of massive amounts of information stored in distributed clusters and potentially in public clouds. Before embarking on projects with live (and quite possibly sensitive) data, it is important to determine the security profile of the data and make necessary provisions to address it.

All things considered, Kajeepta makes a strong argument for using Hadoop to get a big data analytics project started, and he effectively points out many of the pitfalls and best practices that enterprises should consider before venturing into the realm of big data analytics.

Learn more about Fundamentals: 10 Steps to Effective Data Classification by subscribing to Network Computing Pro Reports (free, registration required).

Subsea Cable Alternatives Get Serious Attention

Salvatore Salamone, Managing Editor, Network Computing

April 08, 2024

Subsea cable alternatives can provide a level of comfort for those concerned about disruptions. Realistically, they will continue to play a small, but critical role in the short term.

Technical Debt and the Hidden Cost of Network Management: Why it’s Time to Revisit Your Network Foundations

Lee Howard, Senior Vice President, IPv4.Global

April 05, 2024

Maintaining existing network infrastructures often incurs huge technical debt before newer technologies are adopted over time.

Creating a Global Connectivity Development Strategy: Driving Growth with Low Latency Networks

Veer Passi, Group CEO, Kalaam Telecom

April 03, 2024

Amid ongoing regional uncertainties, telecom infrastructure in the Middle East has expand to include terrestrial low-latency network infrastructure, which offers resilience and agility in navigating political complexities without relying solely on undersea networks.

Hadoop Tames Big Data for the Enterprise: Page 2 of 2

Tags:

Recommended For You

Subsea Cable Alternatives Get Serious Attention

Technical Debt and the Hidden Cost of Network Management: Why it’s Time to Revisit Your Network Foundations

Creating a Global Connectivity Development Strategy: Driving Growth with Low Latency Networks

Search form

Hadoop Tames Big Data for the Enterprise: Page 2 of 2

Tags:

Recommended For You

Subsea Cable Alternatives Get Serious Attention

Technical Debt and the Hidden Cost of Network Management: Why it’s Time to Revisit Your Network Foundations

Creating a Global Connectivity Development Strategy: Driving Growth with Low Latency Networks