Unified Communications

08:25 AM
Connect Directly
Repost This

How To Tackle The Big Data Challenge (Part 1)

Big data is a term getting bandied about a lot these days. It describes the phenomenon of information that keeps growing in organizations, thanks in part to the growth in social media. According to the InformationWeek Research: The Big Data Management Challenge survey of technology professionals, regardless of industry, the five top data drivers are financial transactions, email, imaging data, Web logs, and Internet text and documents. The two main benefits of big data management are being able

One of the challenges of big data is real-time processing, especially in dynamic data environments such as financial trading and social media, Biddick says. "Many queries are difficult to pre-compute and too intense to compute in real time on a single machine. Traditionally, you have to do an approximation to keep the cost of such a query down." He says that Storm, open-source software from BackType, which Twitter bought last summer, does distributed real-time processing of information that enables Twitter users to track trends and figure out how many unique people see a tweet.

"Storm’s architecture uses distributed remote procedure calls, so as you run a processing topology, it implements the RPC function and waits for RPC invocations," says Biddick. "An RPC invocation is a message containing the parameters of the RPC request and information telling Storm where to send the results. The topology picks up messages, does the necessary computations in parallel on several machines and returns the results to the request originator."

He says Storm’s distributed, fault-tolerant approach operates at a higher level of abstraction than message queues. Yahoo’s S4 and Amazon Web Services take similar approaches, Biddick adds. And AWS is developing a stream processing capability that it says will process more than 2 million records per second at launch and eventually will scale to handle more than 100 times that traffic. The company describes the platform as providing near-real-time, highly available and reliable data processing.

Another issue companies need to think about is the ability to access big data--and quickly. "Before thinking about big data architectures, make sure your data policies are clear and accepted throughout the organization," advises Biddick. "They must define the types of data that will be stored, for how long, how quickly you need to access it, and how it will be accessed. These policies will form the basis of storage governance and help define your technology requirements."

Without this foundation, he says, companies will just be throwing storage dollars at problems and end up with a depleted budget, underutilized technology and an inability to plan for future growth. "Big data management," says Biddick, "is challenging enough without worrying about whether you’re managing the right data set."

Learn more about Research: The Big Data Management Challenge by subscribing to Network Computing Pro Reports (free, registration required). View Full Bio

2 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
4/30/2012 | 2:54:56 PM
re: How To Tackle The Big Data Challenge (Part 1)
Great article Esther. It is worth mentioning the HPCC SystemG«÷s platform as a great fit for tackling the Big Data Challenge. Unlike Hadoop distributions, HPCC is a mature platform and provides for a data delivery engine together with a data transformation and linking system equivalent to Hadoop. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model. Also, the ROI for HPCC is significantly better than Hadoop due to the fact that they require less nodes and less programmers. More information visit: http://hpccsystems.com
Hot Topics
Microsoft Lync: 10 Ways To Do More
Kristin Burnham, Senior Editor, InformationWeek.com,  4/17/2014
White Papers
Register for Network Computing Newsletters
Current Issue
Updating your Infrastructure think UC
Updating your Infrastructure think UC
Letís face it, when unified communications deployments stall, itís often not that employees donít want UC. Itís because IT doesnít trust the network to support it, and thereís no budget to make fixes. So what now?
Twitter Feed