To keep things in perspective, my company, Storage Switzerland, funnels the "big" suppliers we work with into categories. Similarly categorizing the vendors you talk to might help you choose the right products. Otherwise you might end up with a Big Mess.
-- Big data analytics. This, in my opinion, is what started it all. It typically means that you have a large pool of data, often billions of very small machine-generated files, that if analyzed can lead to better decision making for your company. The larger and the more historical the data set, the easier it is to make the right decisions. The faster that data can be analyzed, the quicker and more frequently those decisions can be made.
As I discussed in Designing Big Data Storage Infrastructures, storage plays a key role in deriving full value from a big-data effort. The more efficient storage is, the more data you can afford to keep for your historical analysis. The faster that storage performs, the more often you can make those decisions with fresher data.
-- Big data archiving. This second big-data category driving the Big trend is more of a sequential access model than analytics. These are often very large files that need to be available but sit idle until some event triggers their return to popularity. The common example of this: when celebrities die or do something wrong, video and photos from their past become heavily requested. These files are big in size so they need to be stored very inexpensively and for potentially a long time, until they become hot. Then they need to be moved to a faster storage medium during these bursts of access.
Tape, as I discussed in What is Data Monetization?, has found new life in big-data archiving because of its ability to very cost effectively store data for long periods of time and to bring back large files quickly. Big-data archiving might also be the best reason to implement a tiered storage strategy that includes flash, scale-out disk storage, and tape. The software that drives these designs has come a long way since the days of tiered storage initiatives and the monetization of these data assets can make the investment in a tiered storage infrastructure worthwhile.
A real concern, especially for big-data analytics, is how to protect all this data. Backup was a key limiter in the rollout of virtual infrastructures and without "big backup," data protection might be a key limiter in the rollout of big-data initiatives. We'll discuss in an upcoming column some of your options in solving this daunting challenge.
Follow Storage Switzerland on Twitter. George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Storage Switzerland's disclosure statement.
The Enterprise 2.0 Conference brings together industry thought leaders to explore the latest innovations in enterprise social software, analytics, and big data tools and technologies. Learn how your business can harness these tools to improve internal business processes and create operational efficiencies. It happens in Boston, June 18-21. Register today!