Big Data Changes Storage Needs

Growth of unstructured data forces IT managers to look for new ways to scale storage capabilities, with cloud storage a leading option.
In our last entry we talked about the trend of converging storage and compute infrastructures into a single platform. In this entry we will talk about another form of convergence--the convergence of cloud storage into the data center. Like everything else in cloud, implementation of cloud storage can take many forms and connecting it to a data center adds several more issues.

The big driver for a cloud storage system--whether private, public, or hybrid--is the unprecedented growth in unstructured data. Unstructured data is typically thought of as user data and made up of things like documents, spreadsheet, and presentations. User data has grown to encompass much larger data sets like audio and video. No longer can rich media simply be wiped from data center storage systems, since in many cases this data has a legitimate purpose and value to the organization.

The bigger challenge may be from machine-generated data, which is generated by devices or even other applications. Machines can generate data much faster than humans can and the need to store all this data, thanks to initiatives like big data analytics, is becoming more important. As we discussed in our recent article "The TCO Problem of Storage", conventional storage systems may not be able to deliver the economics required to keep pace with the rate of unstructured data growth.

The combination of increased user data plus machine-generated data is the key that will drive the trend toward many data centers having some form of cloud storage. These storage systems are going to need to be scalable, affordable, and utilize capacity as well as data protection efficiently. Most important, these systems may need to break from the traditions of the past and use object-based storage technology to be able to meet the demand to store billions and billions of bits of unstructured content.

The first form that this trend may take is essentially borrowing a page from the public cloud provider's playbook by creating storage systems that are built from commodity-class hardware that is clustered together and leverage an object storage model. There are several companies today that will sell you an object-based storage systems designed to provide this type of functionality, so you don't have to develop your own as some of the early cloud providers did.

The second form that this trend may take is leveraging public cloud storage providers to store the data. The problem, of course, with using the public cloud to store all of your data is that many organizations may not be able to tolerate the latency involved in transferring data to and from a public facility. And laws and regulations may prohibit that option for many industries.

There are several solutions that extend the public cloud into the private data center by using some form of caching appliance. This allows for active data to be stored locally while being replicated to the cloud. Potentially more interesting, as we discussed in a recent case study, some cloud providers are able to set up a cluster of their nodes in your data center. This becomes more than a cache--it quite literally extends a portion of the public cloud directly into your data center. This allows for high-performance local transfers while being a full member of the entire public cloud provider's infrastructure. It also means significantly less investment in staff to manage and maintain the storage.

In either case, the net result of this trend may break down the wall between public and private cloud storage. As data centers become larger, their capabilities to singularly deal with unstructured data may be untenable and, as a result, they will need to leverage a model where they have maximum scalability locally and unlimited scalability remotely.

