I'm satisfied that snapshots and replication in conventional storage systems can serve the same function as more traditional backup schemes. While snapshots make satisfying the most common restore requests easy, the limitations of the snapshot mechanism in most storage systems leaves most organizations using snapshots as a supplement to, not a replacement for, backup copies. Does the cloud change the snapshot as backup calculus? Some cloud storage vendors say it does.
The primary function of a cloud storage gateway, like those from vendors such as Nasuni, Cirtas and StorSimple, is to let users take advantage of cloud storage without rewriting their applications. Without a gateway your applications have to put and get data objects from the cloud storage provider you've chosen through that vendor's particular API. Your users want to store their data on a NAS or file server via CIFS or NFS, Also, server applications like Exchange and SharePoint need traditional block interfaces. The cloud storage gateway maps these common protocols onto the cloud object store and provides a local cache to make your applications run faster.
The cool part is that the gateways also provide snapshots.
Since cloud storage providers will be glad to sell you as much space as you
want, the gateway vendors have designed their systems to let you have an
unlimited number of snapshots of your volume or file system.
That's a big step up from the 16-255 snapshots most disk systems let you keep online and, since the snapshots exist out in the cloud but your gateway has a couple of TB of cache for the working set of data that you and your applications actually access on a day to day basis, those snapshots won't have any impact on performance.
A redundant pair of caching gateways is reliable enough that I would consider them and the snapshot data they hold to satisfy my need for a local copy. Since all your data is in the cloud, data is "backed up" in close to real time and if you need to recover at a remote location you just need to fire up a gateway at the remote site. The new gateway will start populating it's cache as your users access their data and, despite the fact that it's restoring across an Internet link, your users are accessing their most critical data faster than if you restored a whole server from a conventional backup, as the gateway restores data in small chunks as needed.
Now don't get me wrong, cloud snapshots aren't perfect. If you decide to keep 5000 snapshots you'll have to pay Amazon or Nirvanix every month to keep all the data in those snapshots online. Like other snapshots, snaps in the cloud don't come with extensive metadata so a keyword search might be a slow and painful experience as the whole data set has to get dragged down from the cloud.
Using the cloud as your primary storage also puts you at the
mercy of your cloud storage provider. If
they lose your data, raise their rates, go belly up or otherwise cause you
problems, retrieving your data and getting set up elsewhere will be a painful
Now I'm pretty sure that top notch providers are better at data management than most organizations but there is some risk here. The truth is that cloud storage provider SLAs, as important as they may be, can't make you whole after a cloud service provider loses your data any more than Kodak sending you a new roll of film made you whole after they lost the pictures of your honeymoon in Bora Bora or the kid's first steps.
I'm looking forward to the day when cloud gateways can store their data to multiple could back ends to reduce this risk. Even better would be if they could write to a local object store like a Caringo CAStor or EMC Atmos and a public cloud provider. That would give me fast access for eDiscovery and real time offsite backup.