When discussing big data, it’s important to note that not all data is created on the fly; some data sets grow and expand over time and stay in the organizations for a long time, according to Michael Biddick, CEO of Fusion PPT and author of InformationWeek’s The Big Data Management Challenge. While Part 1 of our examination of big data focused on the overall market, drivers and challenges, Part 2 explores some ways to manage big data.
The first steps to managing big data are to focus on both the technical aspects as well as the business needs, says Biddick. "You must really understand the type of data that you are dealing with and also your budget, as big data management and governance can get very expensive."
Develop a data map that classifies the types of data important to the organization. Also consider the geographic location of the data and the stakeholders using it. If the organization has centralized data centers and remote offices, consider how users will access the data, he says. Determine whether standardized tools are needed to generate reports, or whether users need the ability to customize data on the fly.
Some 30% of respondents to the survey say they’re using public cloud infrastructure or storage in production to test some applications. Another 32% have plans to use the public cloud in the future.
For organizations planning to look at vendor products related to big data, Biddick cautions that they should keep in mind that "this is an emerging market segment where a lot of people are talking about different technologies and comparing vendors using different yardsticks. There are big gaps among the batch processing, stream processing, public cloud and private cloud products, and technologies on the market, as well as among the hardware and software vendors vying for a piece of the big data pie."
He says there are also converged offerings that consolidate products as services or appliances for on-premises installations. "The key is to understand your requirements and map them to the options available," Biddick says. All of the storage vendors are important in big data, including IBM Netezza, Oracle, HP and EMC Greenplum, he says. Hadoop is incorporated into many products, especially open source and cloud offerings. Other vendors, like DataXu and Splunk, offer big data tools targeted at different users.
"The key factors to weigh with the big vendors [are] selecting a turn-key solution versus one that you will need to develop and maintain," Biddick says. "The cost difference is substantial, which is why many very large organizations are using Hadoop and open source solutions."