There are several types of devices that you can archive to. The first and one that might be overlooked is a big disk array. Although these often don't have the capabilities to do continuous data verification and might not have the large scaling capabilities that other, more archive-specific systems do, they do have one big advantage: Price. These systems tend to be very cost effective if your archive requirements won't reach the limits of a single array. A few of these systems also have very mature power-saving capabilities such as spin-down drives.
Another option outside of traditional archive storage systems is cloud storage services. Cloud has the advantage of not taking up any of your data center footprint and never running out of capacity. Some cloud providers via third-party archive solutions also can provide complete data integrity checking. They also, of course, have the advantage of a pay-as-you-go license, so the upfront investment is minimal. The downside to these systems is that they are pay-as-you-grow as well. You keep paying and paying. Storing terabytes and terabytes of information in the cloud for decades could be very expensive over time.
[ Wondering how cloud storage differs from online? Read Online Backup Vs. Cloud Storage. ]
There is the option to build your own cloud storage system in house; in other words, a private cloud. As I recently described in my article "What is Object Storage," most of these systems tend to use an object file layout. This gives them tremendous scalability and consistent performance even as the amount of archive data increases. Leveraging an object layout also provides the foundation for doing continuous data verification.
These systems also tend to scale one node at a time, providing a similar pay-as-you-grow capability. Unlike the cloud, though, you own it. This has its pros and cons. There is also the challenge that you have to store all your data on disk. That means these systems need to be powered and running in order to operate. Few scale-out object storage systems have developed the capability to "spin-down" nodes.
Finally, there is tape. Tape wins hands down for price competitiveness and for power efficiency. The above technologies all provide near-instant retrieval. Tape does not. But you have to ask yourself, if a request comes in for data that is 10 years old do you really need to recover it in seconds? Or can it wait a few minutes for the tape to be loaded into a tape drive, found and then recovered? If that is the case then tape might be for you.
Another concern about tape is data integrity. As we discussed in our webinar The Four Reasons The Data Center is Returning To Tape, tape cartridges have actually been proven to be more reliable than a disk drive but they don't have the built-in data integrity checks that some of the above methods do. However, some archiving solutions that support tape provide the ability to perform scheduled scans of tape drives so that integrity can be assured.
So, which one to pick? Most vendors mistakenly look at the archive target as a zero sum game. It all must be on their hardware. We find that most data centers are better served by a mixed approach that leverages two or more of the above solutions: Use disk for the medium-term archive of data, and tape for the long-term deep archive. In fact, in an upcoming column I'll discuss how to leverage tape with either a private or public cloud.