What Amazon's Glacier can be used for: All copies of data can be divided into two general categories--production and data protection. A production copy of data can be an active changeable copy (where there is still a reasonable likelihood of change) or an archive copy (where the data content is fixed and, thus, not changing). We are purposely ignoring commingling of data in the same storage pool in this discussion.
For the most part, newer data is accessed more regularly, and older data is accessed less often. Even here, there can be a bifurcation between pools of data where access is needed within seconds and minutes (that is, an active archive), or can take hours to days without being a problem (deep archive).
The primary use of Glacier is for the storage of deep archive data, or "cold" storage, an area that's often neglected because it isn't very exciting. SSDs are hot and sexy because performance-requiring data is exciting, whereas data that needs to be put in suspended animation isn't. Cold storage is important because more and more data is falling into an archive category (80% of data is presumed to be fixed), and even though much of that data could be culled (permanently deleted), the intensive labor required makes that unlikely to happen. So Glacier provides needed attention to an important part of the storage market.
What Amazon's Glacier brings to the table: Glacier provides storage as a service in a public cloud for the long-term preservation of data--the vast majority of which should never need to be retrieved. AWS' use of powerful 256-bit encryption protects the confidentiality of the data. It's the same technology used in its S3 offering, and indicates that Amazon is using one of the data protection techniques that can be used with object storage.
Quite frankly, most companies' internal IT shops would be challenged to reach similar levels of data security and integrity. For example, AWS organizes its service infrastructure into what it calls "availability areas." The East Coast availability area is supported by 10 geographically separate AWS data centers. Any data stored in the availability area is backed up to two facilities in addition to its primary location. Very few enterprises have the scale to equal that level of physical disaster recovery protection.
Yes, there might be speed of transfer in and out for large-scale deployments, but AWS has, essentially, dissolved the barriers that may have prevented an organization from moving to a public cloud for storing the kinds of data that Glacier is designed to protect. On the surface, Glacier is a solid solution.
The User Is Not Off the Hook With Glacier
As good as AWS Glacier appears to be, it doesn't let users (from individuals to large organizations) off the hook as far as management is concerned. Planning what data needs to be stored and how it can be retrieved if necessary is fully in the hands of Glacier customers. For example, data is stored in what Glacier calls archives within vaults. Storing one file in an archive would facilitate recovery, but if you had millions of files, that wouldn't be very practical. Plus, retrieving an archive with millions of documents where only a much smaller number of documents are needed would incur extra time and cost.
Thinking ahead is the key to using Glacier effectively. So if some data is more likely to be retrieved than other, assigning it into specific archives would be useful. Moreover, Glacier can't solve the problem of obsolescent software utilized to access data. This is a major issue in long-term preservation of data. For guidance in this area, organizations should study the good work that the eXtensible Access Method group at Storage Networking Industry Association is doing on this subject.
Glacier charges $120 per terabyte per year to store data, a low cost that has created not just excitement in the market but also claims that tape is dead. Glacier doesn't charge for transfers into storage, but it does charge for transfers out of storage. While the service does present a challenge to tape, that isn't because a disk solution is inherently less costly.
The argument gets down to an in-house vs. public cloud offering. For companies that don't use tape in-house, Glacier may very well provide a more effective service due to its cost and/or ease of use--it requires no up-front investment in equipment, personnel or training. For those using disk more and more for backup and who also have a tape infrastructure, Glacier may prove an attractive proposition. Tape vendors will have to convince them that archiving--both active and deep--as well as deep backup copies, are best left on tape. That story plays best to large and sophisticated customers. In other words, tape isn't dead, but it faces another challenge.
Overall, Glacier provides a pay-as-you-go expense model that reflects operational cost versus capital cost and is easy to plan for. Moreover, it seems affordable and competitive.
Amazon seems to be a prescient company that anticipates needs and defines well-crafted solutions that meet real requirements for both consumers and businesses. Glacier is just the latest example of this. Although it's still fashionable to talk about the private cloud, Amazon inexorably continues to make its case for the use of the public cloud in a growing number of ways.
No, public clouds aren't taking over all of the cloud world (as hybrid seems the most reasonable use-case scenario, over time). But it will continue to make inroads in targeted areas, including the deep archiving and selected backup solutions that Glacier delivers.
Amazon's Glacier won't kill tape, but it should make tape vendors stand up and take notice, especially since the company has been the first mover/innovator among public cloud providers in numerous spaces. Though Glacier doesn't qualify as an automatic loss for tape vendors, in all, it looks like another win for Amazon.