Big Data: Store Everything And Watch Storage Grow

One of the big stories from the Teradata Partners conference that just finished up in San Diego was the huge advantage that retailers gain by the tracking the buying actions of their customers online and the enormous impact that's going to have on your storage assets in the years ahead. Business managers can't predict what questions they want to ask about their customers and can't say which collected data is useful or not. The answer is to store it all.

David Greenfield

November 17, 2010

4 Min Read
Network Computing logo

One of the big stories from the Teradata Partners conference that just finished up in San Diego was the huge advantage that retailers gain by the tracking the buying actions of their customers online and the enormous impact that's going to have on your storage assets in the years ahead. Business managers can't predict what questions they want to ask about their customers and can't say which collected data is useful or not. The answer is to store it all.

Companies have always relied on market research to understand how to market and sell to their customers. But the emergence of online data particularly couple with social media is creating a venerable information explosion that will enable another order of magnitude of insight into customer behavior. "As we transition from regression models to all of these analytics that you do around a network or a graph of related things a whole huge amount of discoveries can be made," says Paul Kent, the vice president of research and development at SAS.

Ebay, for example, conducts some 100 different experiments at any one time on the site, involving thousands of customers and resulting in millions of data points, noted Oliver Ratzesberger, Ebay's senior director of architecture and operations. As such Ebay gains powerful insights into how users purchase products on the site. One practical example, noted by Ratzesberger, was in the way Ebay presented dresses. At any one time there were some about 700,000 dresses that women could choose from, far too many for any one person to scroll through. With research done on its site, Ebay found that a new feature allowing users to establish personal profiles detailing their sizes, preferred style, manufacturers etc. would be well received by customers.

Storage requirements will grow further though not because of the need to map customer behavior across one site, but because organizations want to analyze the social graph of their customers. "The real advantage [of the online world] is the ability to track customers longitudely," says Mark Jeffery from Kellog. The World Bank of Canada, for example, has a relationship with Weddingbells.com, noted Jeffrey, that allows them to track consumer interactions across the sites.

But getting to that valuable "stuff" means building up a massive database of interactions, both on an individual site and informed by intelligence from partnering sites. In fact, Ebay is getting to the point where simply tracking customer actions across the site will no longer be sufficient because its pages changes so frequently (about every five minutes.) So in order to understand analyze customer behavior, Ratzesberger thinks they're heading to the point where Ebay will need to store every screen that a customer sees - an enormous amount of data."If you ask my boss what information he'd like to keep and what information he'd remove, he'll tell you none, " says Ratzesberger. "We simply don't know what queries we'll need to run in the future. As such if we don't store the data today it'll takes us 13 months to build up a history to answer that sort of question."

For storage and IT professionals the inclination to store greater amounts of data and retrieve more of that data suggests a growth in the primary data tier as well as a growth in the secondary data tier. SSDs with their faster retrieval times will be needed to provide for access to the very hottest of data with disk providing a "fast" secondary layer for perhaps less frequently, but still very frequently accessed data.

IT will also need to think about it backup and redundancy plans. It's one thing to duplicate or backup and restore a 100 gigabytes of data. It's a very different matter to do so on a terabyte or a Petabyte of information. Retrieval times are necessarily longer; the sheer cost of storing the data so much higher.

And while today the largest organizations may be using data warehousing look for those capabilities to move downstream to the SMEs. From a business standpoint, the benefit to an Ebay is the same to any e-commerce organization so there will be a natural imperative to try and claim as much data as possible. The means will also be become more viable. In discussion with Todd Sylvester, Teradata's director of R&D strategy, it's clear that the company isn't looking downstream to bring its software to the Fortune 10,000. This will likely take the form of a data warehouse service where service providers can deliver data warehousing the cloud. Organizations then will be freed from the challenge of having to build out the necessary storage infrastructure locally.

The introduction won't come soon. Discussions are underway right now with Teramark, the service provider, he notes. Then in 2011 there are a number of structural enhancements that need to happen to the Teradata database for it to support multi-tenancy. But in 2012 look for Teradata to make its entry and bring the power of big data to small companies. 

Read more about:

2010

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights