Hadoop Tames Big Data for the Enterprise
Frank J. Ohlhorst
March 23, 2012
Taming big data is becoming one of the most important chores in enterprise data centers today. After all, massive amounts of business data can have massive amounts of value, especially if it is mined correctly. Uncovering that value and leveraging big data for analytics is quickly becoming one of the cornerstones of business analytics. However, big data analytics can become an insurmountable challenge if you don’t have the correct tools.
Network Computing’s sister publication, InformationWeek, recently published an in-depth strategy study ) that made recommendations for solving the problems of big data. The report’s author, Sreedhar Kajeepeta, identified the value behind technologies such as Hadoop, MapReduce, NoSQL and other technologies that are designed to help enterprises deal with massive amounts of data, both structured and unstructured.
Kajeepeta also identified many of the sources to leverage for big data--in other words, where to look to find the hidden value. He suggested leveraging data sets such as those that include historic credit card transactions, phone call records, utility metering and bills, travel bookings and schedules, weather forecasts and readings, real estate transactions and trades of financial securities. Of course, the well of data could almost be unlimited, especially when one considers including data sets of commercial value (from the so-called democratization of large data); data generated by mobile, barcode and RFID devices; and, lately, data related to genomics.
However, Kajeepeta infers that the volume of data involved is the root cause of the analytics challenge. He wrote, "Companies are challenged to find the needle in the information haystack--be it an operational issue or a growth vector--in weeks or months, which is at least four times as fast as traditional data warehousing solutions, assuming that the conventional methods can even attempt to go after some of the complexities involved in handling the nuances of unstructured data."
A recent study by the Enterprise Strategy Group--which defines big data as data sets that exceed the boundaries and sizes of normal processing capabilities, forcing organizations to take a non-traditional approach--says the cure can be almost as painful as the problem. Managing big data is an issue because the platforms are expensive and require new server and storage purchases, training in new technologies, building up an analytics toolset, and finding people with the expertise in dealing with it.
Six percent of respondents to the ESG study said that big data was the most important IT priority; 45% said it was one of the top five IT priorities. As for the data analytics challenges, 47% named data integration complexity, 34% cited the lack of skills necessary to properly manage large data sets and derive value from them, 29% cited data set sizes that limited the ability to perform analytics, while 28% said difficulty in completing analytics within a reasonable period of time.
Another study from Infineta Systems, a provider of WAN optimization systems for big traffic, found that data center-to-data center connectivity is a "silent killer" for big data deployments. And a third recent study from IDC, IDC Predictions 2012: Competing for 2020, said big data analytics technologies will be one of the driving forces for IT spending through 2020.
Kajeepeta suggests that the answer to managing complex, large, unstructured data sets depends on choosing the appropriate platform and associated tools--namely, Hadoop, a big data storage platform.