• 04/23/2012
    8:25 AM
  • Rating: 
    0 votes
    Vote up!
    Vote down!

How To Tackle The Big Data Challenge (Part 1)

Big data is a term getting bandied about a lot these days. It describes the phenomenon of information that keeps growing in organizations, thanks in part to the growth in social media. According to the InformationWeek Research: The Big Data Management Challenge survey of technology professionals, regardless of industry, the five top data drivers are financial transactions, email, imaging data, Web logs, and Internet text and documents. The two main benefits of big data management are being able
One of the challenges of big data is real-time processing, especially in dynamic data environments such as financial trading and social media, Biddick says. "Many queries are difficult to pre-compute and too intense to compute in real time on a single machine. Traditionally, you have to do an approximation to keep the cost of such a query down." He says that Storm, open-source software from BackType, which Twitter bought last summer, does distributed real-time processing of information that enables Twitter users to track trends and figure out how many unique people see a tweet.

"Storm’s architecture uses distributed remote procedure calls, so as you run a processing topology, it implements the RPC function and waits for RPC invocations," says Biddick. "An RPC invocation is a message containing the parameters of the RPC request and information telling Storm where to send the results. The topology picks up messages, does the necessary computations in parallel on several machines and returns the results to the request originator."

He says Storm’s distributed, fault-tolerant approach operates at a higher level of abstraction than message queues. Yahoo’s S4 and Amazon Web Services take similar approaches, Biddick adds. And AWS is developing a stream processing capability that it says will process more than 2 million records per second at launch and eventually will scale to handle more than 100 times that traffic. The company describes the platform as providing near-real-time, highly available and reliable data processing.

Another issue companies need to think about is the ability to access big data--and quickly. "Before thinking about big data architectures, make sure your data policies are clear and accepted throughout the organization," advises Biddick. "They must define the types of data that will be stored, for how long, how quickly you need to access it, and how it will be accessed. These policies will form the basis of storage governance and help define your technology requirements."

Without this foundation, he says, companies will just be throwing storage dollars at problems and end up with a depleted budget, underutilized technology and an inability to plan for future growth. "Big data management," says Biddick, "is challenging enough without worrying about whether you’re managing the right data set."

Learn more about Research: The Big Data Management Challenge by subscribing to Network Computing Pro Reports (free, registration required).

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.

Log in or Register to post comments