Amazon's new Glacier service brings the cost of cloud storage to just down to just 1 cent per gigabit per month. To make the deal even better, Amazon claims Glacier is designed to deliver 11 nines (99.999999999%) reliability to storing data across multiple systems in multiple facilities.
Of course, as Robert Heinlein so famously wrote, there's no such thing as a free lunch. Unlike Amazon's significantly more expensive S3 cloud storage service, Glacier doesn't provide immediate access to your data. You have to request a retrieval job to access an object (or an archive of objects) that you've stored on Glacier, and Amazon can take three to five hours to make your data available.
In addition, while you can access up to 5% of the data you've stored on Glacier each month for free, there are retrieval fees if you want to access more. Interestingly, these retrieval charges don't apply for retrievals accessed by EC2 cloud servers, which could lead to a very interesting EC2-based e-discovery service that indexes and searches data, avoiding the retrieval charges and Internet transfer time.
It seems to me that the cost and service model of Glacier makes it an attractive alternative to off-site tape storage at Recall or Iron Mountain. As my friend Storagebod noted, however, three to five hours of retrieval time may not be fast enough for a media organization that has to edit its last Neil Armstrong interview into the obit running on the evening news.
On the other hand, many organizations have lots of what I jokingly call WORN--write once/read never--data that they retain either because of legal mandates or because no one has the guts or authority to delete it. Things like the home directories and HR records of users fired five years ago, and X-rays of inactive patients. If the IT department could set expectations right, that data could be archived to Glacier and no one would ever have to worry about it again.
The other case where I find Glacier attractive is as the storage location of last resort for backup data. Many smaller organizations that have just one data center send tapes to the warehouse once a week because they don't want to pay a courier to pick up tapes daily. If their backup software supported Glacier, it could automatically duplicate the local backup data to Glacier as each backup job completed.
This spring I was the winner of "The Great Debate: Cloud Storage is Dead on Arrival" at a session at Interop. A substantial part of my argument was that I could purchase cheap server-based storage for my data center, comparable to the gear that cloud providers use, for substantially less than S3's cost of storage. At $120 per terabyte per year, Glacier is actually cheaper than most in-house disk products. The Aberdeen NAS solution I used as an example would cost about 30% more than Glacier and still have all your data in one storage array, in one location.
A Quantum i80 tape library that holds 75 Tbytes on 50 LTO-5 cartridges costs about $15,000 with a full complement of tapes, or about a third the cost of Glacier amortized over five years. That would again be for one copy in one location and therefore nowhere near Glacier's promised 11 nines reliability.
To get comparable reliability from tape, you'd have to make multiple copies and send one off-site. While the cost of bonded couriers and warehousing will vary depending on how frequently you're shipping tapes and how many tapes you're shipping at a time, it looks like Glacier is at least in the same cost ballpark as the traditional solution without figuring in extra charges for rush delivery of tapes when you need your data faster than next business day.
Amazon has been closed lipped about whether it's using tape to implement Glacier; it certainly could be, given the service-level agreement, or a disk-based object store with deep erasure coding. I, for one, am interested in that answer.
Once a few backup and archive software vendors add Glacier support to their products, cloud storage will be a lot more attractive for idle data, at just a penny a gigabyte.Howard Marks is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage ... View Full Bio