Strategic Info Management: Long-Term Storage

Long-term storage doesn't have to stink. We give you the options available to keep media rot and poor processes at bay--and data available when you need it.

September 8, 2006

16 Min Read
NetworkComputing logo in a gray background | NetworkComputing

Remember being a kid and worrying about that little incident with the stink bomb defiling your "permanent record"? At many educational institutions, permanent records, defined as those that have "continued administrative value," must be retained for at least 100 years. What stinks is being the grownup responsible for selecting media that will ensure this data remains retrievable.

Strategic Information Management

In designing a data-archiving system, system architects have four main decisions to make: What to store, how long to store it, how to archive and index it so you can retrieve specific data when needed, and where to store the archive. Here we focus on the last problem. Whole forests have been sacrificed in the name of generating rules and regulations on what to store and for how long: We worked with one tobacco company whose legal department decreed that all tobacco-business-related documents are to be kept forever, figuring the truth can't be worse than failing to produce subpoenaed information. We've also dealt with companies whose legal eagles ordered all e-mail deleted after 30 days.

The MediaSure, ink on acid-free paper and microforms (microfilm and microfiche) have a history of long-term stability and readability. But they're bulky and time-consuming to access, so your focus should be on digital mediums. You have five main options for storing digital data for long periods:

» Tape, the conventional choice, offers high information density at a reasonable price, but retrieval times are too long for active archives like medical images.

» Disk arrays are fast but have a higher cost especially when you factor in power, cooling and maintenance.

» Optical media, including CDs and DVDs, have low costs for media but are vulnerable to damage from physical handling, and their consumer heritage doesn't inspire confidence--vendors have focused on cheapness, not quality.

» Professional optical formats have a history of long-term data stability but don't have the density of tape.» Removable hard disks designed for backup are tempting, but no one knows how long a disk on a shelf will last.

Regardless of the medium you choose, putting all your eggs in one basket is never advisable. All projected data lifetimes are based on proper storage, and fire, smoke, flood or other catastrophe can wipe out your archive as easily as it does your primary data. Duplicate archives in multiple locations are the only way to ensure you'll have your data when you need it.

Also, if you're in an industry that mandates non-rewriteability, you'll need WORM (write once/read many)-capable media. Non-rewriteability can be implemented chemically, as in CD or DVD-R media, or in software on systems ranging from tape drives that will only write once to special WORM tape cartridges or disk-array systems, like EMC's Centera, or a NetApp filer running SnapLock. The key is that no one, not even the system administrator, can make changes to data once it's written.

Tale of the Tape

Tape has been going strong for 50 years. Disk vendors keep trying to write the obituary, yet tape capacities continue to increase. In the past few months, vendors have boosted native capacity to 800 GB per tape, and in May, IBM researchers announced that they've managed to cram a whopping 6.67 gigabits per square inch on an experimental tape formulation by FujiFilm. When and if such technology comes to market, likely the end of the decade at best, we could see 8 TB spread over a half-inch cartridge tape like the LTO Ultrium. Be still my heart.With proper storage (60 to 70 degrees with 30 percent to 40 percent relative humidity), data on magnetic tape should be readable for 30 years or more. The trick is having a compatible drive. DLT drives typically can read tapes from two generations (seven to nine years) back. Modern midrange (LTO and DLT/SDLT) and data center (IBM 3590/92 and StorageTek T9840-T10000) technologies use servo data prewritten on the tapes at the factory to ensure that heads are positioned accurately. This effectively eliminates the incompatibilities in reading tapes in a different drive than the one on which they were written that plagued previous formats, like QIC and even DDS.

Although it may be hard to correlate a 30-year life with reports that somewhere from 20 percent to 50 percent of attempts to restore data from backup tapes fail, our experience is that restore failures are most often procedural--IT added a volume to that server and forgot to add it to the backup job, for example, or the backup job skipped that important open file. A more reasonable expectation from tape may be only 15 years, dependent on the error-correction capabilities of the format to mitigate bit rot.

Experts disagree on whether today's tapes require periodic retensioning but agree that most tape failures are due to tape-edge defects or tape wear. Use brand new tapes for your archives, and discard cartridges that have been dropped.

Day of the DiskSpinning disks, even SATA disks, are fast, making them suitable for active archives. And, the rising capacity and reliability of SATA drives means keeping archives online only isn't out of the question. Even organizations that need WORM for compliance can use systems, like EMC's Centera or NetApp's SnapLock, that store data on disk arrays in the nonmodifiable format regulators require.

Vendors of enterprise-class drives typically claim MTBF (mean time between failures) in excess of 1 million power-on hours, or 114 years. This is not intended to imply that the average drive will last 114 years, but rather that, on average, one of every 114 drives will fail per year. In addition to just dropping dead, disk drives sometimes can't read a given sector. Manufacturers typically cite an irrecoverable read rate of 1 in 1015 bits read, for enterprise-class Fibre Channel and SCSI drives, and 1 in 1014 for their higher-capacity SATA models targeted at RAID applications. Under normal conditions, if a read error occurs, the RAID controller will rebuild the effective sector using mirrored or parity data.

Seven Ways to Keep Media Rot at BayClick to enlarge in another window

Storage by the NumbersClick to enlarge in another window

In the event of a drive failure, a RAID controller will rebuild the array using a replacement or hot-spare drive. To rebuild a 14-drive array of 500 GB SATA drives requires reading the entire contents of the remaining 13 drives or 5.2x1014 bits. Note that this exceeds the error rate of the enterprise SATA drives in the array, making it probable that some data will be lost in the event of a RAID array rebuild. This is a strong argument for using smaller RAID sets or, even better, a double-parity scheme like RAID 6 that will let the array-rebuild process accommodate unrecoverable read errors without data loss.Limiting factors in hard drive data life are twofold: First is thermal magnetic decay, the process by which all magnetic storage devices just, well, fade away. As bit densities increase, so does the effect of decay. Modern hard drives should hold data for 20 years or so, but vendors promise just 10 to be on the safe side.

The bigger worry is array obsolescence; vendors typically promise service and spare parts for only five years. An EMC IP4700 announced in 2000 has officially reached end of life--and support for it now costs four times what it did. This means you'll have to migrate your data to a new system every few years.

One interesting application of hard drives for archival storage is the LOCKSS (Lots Of Copies Keep Stuff Safe) program (www.lockss.org/lockss/Home). This network of servers at libraries preserves academic journals. The servers periodically hash data items and exchange the hash values to identify copies that may have succumbed to undetected bit rot and replace them with data from the remaining valid copies.

Copan's MAID (Massive Arrays of Inactive Disks) is another innovative use of hard disks for archiving. MAID spins down RAID sets when they're not being accessed, reducing wear, power and cooling costs. Copan's systems also perform periodic "disk aerobics," spinning drives up and checking their integrity (read more about MAID on page SIM12).

Removable Hard DiskMany small-business IT groups have never had a good relationship with tape, partially due to low-end tape products from vendors like Colorado Memory that were, frankly, not quite up to the task. In the last few years, some of these shops have turned to FireWire and USB external hard drives as a backup medium.

We've been using hard drive caddies in our lab systems for years to make it easy to switch systems that don't natively support hot swapping back and forth between operating systems or products under test. Easily available from vendors like Kingwin and StorCase Technology for just $25 or so, they simply enclose a standard drive in a plastic or aluminum carrier and use an RJ-21X or DIN connector to mate with the dock.

At least one vendor, Idealstor, has used this technology for disk-to-disk backup, putting drive docks in a backup appliance running Windows server and adding a driver to make the docked hard drives appear to Windows as removable, rather than fixed, storage; this lets backup applications extend a backup job across multiple disks and track offline drives, like other removable media.

Recently, several vendors have taken the concept of putting a hard drive in a cartridge to the next level. Quantum's GoVault, ProStor Systems' RDX (also sold by Tandberg), and Imation's Ulysses and Odyssey all take 2.5-inch mobile drives, which have significantly better nonoperating shock resistance than desktop or enterprise drives, and package them in sealed cartridges with ESD (electrostatic discharge) protection and additional shock absorbers so they can survive a drop from a desktop to a hard floor without data loss. GoDrive, Odyssey and RDX are desktop solutions for the SOHO/SMB markets with docks that accept a single drive. Ulysses uses cartridges that are the same size as LTO tapes and a docking "drive" that is the same size as an LTO tape drive. The drive also uses the same interface and command set as an LTO tape drive, so Ulysses technology can be used in LTO tape libraries with minimal re-engineering, bringing new meaning to the term "virtual tape library."

The real problem with using removable hard drives as a long-term storage medium is that no one has done any research on how long you can keep a drive on the shelf and expect it to both spin up and be able to retrieve data. No drive vendor will publish a drive shelf life, but we'd estimate five to 10 years at the outside. Without MAID-like drive maintenance periodically spinning drives up and down, we can't recommend removable storage as an archive medium.Consumer Optical Disk

At first glance, recordable CD and DVD disks look like a good archival medium. DVDs have reasonable storage capacities, and high sales volumes have made media cheap. Unfortunately the same factors that have driven media prices down have resulted in inferior media flooding the market.

Vendors make all kinds of wild claims about CD-R and DVD-R media life. In just a few minutes on the Internet we found media with claimed lifetimes of 50, 100, even 300 years and warranties that promise a new disk if you lose your data!

In reality, industry tests, including one published in the Dutch magazine PC-Active, have found that low-quality recordable media could have a life of just a few years. Different vendors use different dye formulations and manufacturing processes that can have a significant effect on disk life due to factors like dye degradation and corrosion of the aluminum/sliver alloy reflective layer.

Even if you use high-quality media, the naked disks are subject to damage in handling. The polycarbonate substrate used in these media has admirable optical characteristics--it's used for most eyeglass lenses--but it's rather soft and susceptible to scratching and scuffing. CDs are also vulnerable on the top surface which is just a thin layer of lacquer covering the reflective layer. Writing on a CD-R with a ballpoint pen can deform the reflective layer and destroy data. Bottom line, the list of things to worry about if you're using consumer-grade CD/DVD for backup vastly exceeds the space we have here.DVDs are a polycarbonate sandwich eliminating this vulnerability, but there are also concerns about eventual delamination. We've seen cases where a slightly damaged or off-balanced disk shattered in a high-RPM drive; killing both drive and disk. Some vendors, particularly Mitsui Advanced Media and Imation's Memorex brand, make media especially for archival storage that use more stable phthalocyanine dye, a gold reflective layer to maximize data life, and a scratch resistant coating on the bottom.

Just coming to market are the next generation of consumer optical formats, Blu-Ray and HD-DVD, which boast higher capacities but will likely be subject to the same problems as current CDs and DVDs.

Although a few vendors, including Hitachi and PowerFile, still make DVD-R and DVD-RAM libraries others, including Plasmon, are leaving the market as they find the cost of supporting users who are not as particular as they should be in buying media too high. If you still want to go this route, NIST, the National Institute of Standards and Testing, has done extensive research on CD and DVD storage. Its special publication 500-252, available in PDF form online, is the best guide to the care and feeding of CDs and DVDs available.

Professional Optical Disk

In addition to the common optical disk technologies for consumer electronics applications, some optical technologies are designed from scratch for archival data use, emphasizing reliability rather than low cost. Unlike CDs and DVDs, professional optical formats use concentric, rather than spiral, tracks and spin at a constant angular velocity to speed up random access, eliminating the spindle motor speed change delay inherent in a constant linear velocity systems. Another contributor to professional optical disk reliability is that media are encased in a protective cartridge with a shutter that opens only when loaded in a drive, protecting the disk from light, dust, scratches and other environmental hazards. Newer systems even blow filtered air through the cartridge when it's inserted in the drive to blow off any possible dust contamination.Until the recent release of WORM tape formats and non-modifiable disk systems, magneto-optical disks were the only game in town for non-modifiable archives. Vendors, including Breece Hill/MaxOptix, Fujitsu, Plasmon and Sony claim that MO disks provide reliable data storage for 30 years or more; except for complaints about balky robotics on some MO jukeboxes, reports from real world users indicate that most shops are having no trouble reading data written 10 or even 15 years ago.

Unfortunately while hard drive capacities have soared from 300 GB to 750 GB, MO technology basically ran out of steam, toping out capacity at 9.1 GB per disk--and getting even that much requires flipping the disk over. As organizations have more data and are being asked to keep it longer than ever, MO sales are falling off as storage groups search for a denser medium.

Some see Plasmon's UDO (Ultra Dense Optical), released in 2004, as the logical successor to MO technology. UDO packs 30 GB of data on a cartridge the same size as a 9.1 GB MO disk using a blue laser and phase-change materials. Because UDO uses the same size cartridge as MO, library vendors can, and do, use the same robotics for MO and UDO libraries, just as tape library vendors can address multiple capacity targets by using different capacity LTO or DLT drives.

UDO offers the same projected 30 year media life as MO technology, and Plasmon has talked up a road map to boost capacity to 120 GB over the next five or six years. In addition to WORM and rewriteable media, Plasmon offers special compliance media that provides for unrecoverable file erasure.

Sony's 23GB PDD (Professional Disk for Data) format, also introduced in 2004, looked like it could give UDO a run for its money, but OEM acceptance has been lacking, and Sony seems to be concentrating on Blu-Ray.Holographic storage devices that can store astounding amounts of data on tiny media devices were once the province of science fiction writers, but InPhase Technologies and Maxell say the future is now. Using DLP-like spatial-light modulators developed for video projectors and CCD sensors from digital cameras, the InPhase holographic drive stores 300 GB of data throughout the entire 2.5 mm thickness of the disk's media. Sample units are now in use at media companies, including Turner, and production units are expected to be generally available in about a year. InPhase estimates that drives will sell for about $17,000 and media for $150. They have a three-generation road map to boost capacity to 1,200 GB per disk. We certainly hope to be around to see that.

Two Pieces of the Puzzle

Just having valid bits on your storage media isn't enough to access your data when that subpoena hits your desk. You'll also need the appropriate hardware and software to read it with: If your idea of an archive is sending end-of-month backups off for long-term storage, you could be in for a rude awakening.

The reality is, no known format can be counted on for more than 30 years, 50 at the outside. When it archives highly valuable data, the U.S. military mothballs complete systems, including tape drives and computers running the software that wrote the data to the media, but few organizations have that luxury. For now, our best advice is to keep accurate records of how all data was stored and plan to migrate data from one format to another every 10 years or so.

Howard Marks is founder and chief scientist at Networks Are Our Lives, a network design and consulting firm in Hoboken, N.J. Write to him at [email protected].Executive Summary

All across corporate America, C-level execs are turning their attention to data storage and archiving as never before. There's nothing like the threat of incarceration to get someone to really focus. Vendors, smelling money to be made, have found a way to take just about every storage product and spin it as an "archiving/compliance solution." Sorting through the marketecture is a full-time job.

To make things even more confusing, how you can store your data and how long you have to keep it depends on which laws and regulations apply to your organization. And highly regulated industries, like securities brokers and pharmaceutical companies, have to maintain records in a non-rewriteable and non-erasable format.

As if that's not enough to worry about, there's also the threat of bad processes--something you discover only when you recall a box of tapes and find they're OnStream ADR tapes and you don't have an ADR tape drive anymore, or they weren't made with your current backup program and no one on staff remembers what backup program you used in 1991. In this article we discuss the options available to keep media rot and poor processes at bay and data available when you need it.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights