Mention cloud storage to most IT professionals and they think of Internet services like Amazon S3 and Nirvanix that store your data in their data centers.
But a storage cloud doesn't have to be public. A wide range of private cloud storage products have been introduced by vendors, including name-brand companies such as EMC, with its Atmos line, and smaller players like ParaScale and Bycast. Other vendors are slapping the "cloud" label on existing product lines. Given the amorphous definitions surrounding all things cloud, that label may or may not be accurate. What's more important than semantics, however, is finding the right architecture to suit your storage needs.
A prototypical cloud storage system is made up of a number of x86 servers, each with its own storage, most commonly using four to 16 SATA drives. Users and their applications access the system through standard file access protocols like CIFS and NFS or via object storage and retrieval protocols like SOAP and REST.
The storage nodes in a private cloud are linked together with a layer of smart software, which performs several functions. First, it maintains a global name space that allows all the storage in the cluster to be accessed as a single entity, so that administrators can add storage capacity on the back end without having to tell applications at the front end how to reach it. The software also handles drive failures and keeps data available to applications and end users.
A private cloud storage infrastructure should also be able to scale from hundreds of terabytes to multiple petabytes. That level of scalability is achieved not with a forklift upgrade, but simply by adding more servers as they're needed.
This architecture provides two major benefits. First, storage administrators can configure and provision new storage nodes quickly and inexpensively. Second, administrators can add capacity only as demand requires, instead of purchasing additional disk space to meet anticipated future growth and then having that capacity sit idle in the present.
However, there are also trade-offs. Cloud storage is best suited to unstructured data, such as medical images, engineering drawings, and Office documents. For another, because each x86 server isn't as reliable as a high-end enterprise disk array, a private cloud must store copies of the data on multiple nodes. This requires more raw disk space than an enterprise disk array using a RAID-5 or 6 system. For example, if you set a policy for your private cloud to keep three copies of a 60-GB file for data protection, it would require 180 GB of disk, whereas a 6+2 RAID-6 system would need just 80 GB.
Beyond Low Cost
PRIVATE CLOUD STORAGE
Can reduce up-front hardware and administrative costs
Takes advantage of low-cost servers and storage
Makes it easier to add storage capacity
Most appropriate for large volumes of unstructured data
Private cloud software vendors are focused on finding ways to differentiate themselves from their competitors. For instance, Cleversafe says it gets around the RAID issue through unique dispersal algorithms that ensure data availability with less than 40% overhead.
Several other vendors include location-aware policy engines that copy data to nodes in specific geographical locations. Data Direct Networks' Web Object Store, Bycast's StorageGrid, and EMC's Atmos systems can specify that two copies of each object in a folder should be stored in New York and Los Angeles, and that copies also should be stored in two other locations.
This not only protects data from data center failures but can also put objects on storage clusters close to the users who need them. Bycast's policy engine takes this notion one step further by including elements, such as storage tiering, that can migrate objects from more-expensive to less-expensive disk, and even to and from tape.
Organizations planning to offer private cloud storage services to internal departments may want to consider multitenant features that allow storage to be partitioned among different groups. For example, IT could carve out one section of the private cloud for HR and another for marketing, and then charge those departments based on usage. This means having delegated administration models and/or virtual servers that restrict each group's access and visibility to only their own data and the resources assigned to them. A multitenant storage system should also include accounting features that collect usage data, such as peak utilization, that will help IT in determining chargebacks.