Of Backups and Archives

The time has come for us all to stop holding backup tapes for years at a time and pretending they're an archive.

Howard Marks

May 21, 2009

3 Min Read
Network Computing logo

We all have words or phrases that make our blood boil. For me, "That's the way we've always done it" is near the top of my list. Of course, it really means I don't have a good reason for why we do things that way. In the 25 years I've been an independent consultant (which really means change agent), I've lost count of the times I've been hired to help an organization clean up some process only to hear "that's the way we've always done it" as if historical precedent should be the primary driver of the planning process.

So let me say once and for all -- the time has come for us all to stop holding backup tapes for years at a time and pretending they're an archive. While old DLT7000, or even worse, DDS tapes at Iron Mountain may meet the legal definition of retention they don't make a useful archive.

The existential difference between backup repositories and archives isn't the media they use or the hardware they're built on but their purpose. As a writer I find this clear in the language we use to describe the process of getting data from each type of data store.

We make backups in order to restore things like servers, databases, file systems, mailboxes or even individual files or email messages to their previous condition should they be lost, damaged, deleted or corrupted. Restores, in general, return things to their original place and condition so they can be used for their original purpose.

Archives on the other hand exist so data can be retrieved. Once retrieved that data is usually used in a different way than when it was originally created. Emails can be restored to be answered or acted on, or they can be retrieved to settle an argument, legal or otherwise.

Therefore, backup repositories are organized by context like where the data was when it was backed up and when it was backed up. Actually accessing the data requires restoring it and usually reconnecting the applications that support it. Anyone who's ever tried to recover data from an Exchange 5.5 backup can attest to just how much effort it takes.

On top of that, most backup applications are designed to restore data that's been backed up recently, keeping just a few months to a year of index data, so just figuring out what tape has the June 14, 2006 backup of the executive home folders is a project.

Archive your files, email, etc with Mimosa Nearpoint, Enterprise Vault, MetaLogix PAM, Atempo's Digital Archive, EMC SourceOne or any of the seeming hundreds of other archiving applications on the market and it builds an index not just of your data's location and backup time but its content as well as its context, and that full text index lasts as long as you've told it you want to retain the data. Now you can search for documents from June 1-June 20 2006 including the keywords "ohnston, Smythe and harassment".

Of course all that indexing takes time so an archive solution won't Hoover up data as fast as a backup solution can. But remember that you only have to archive data once, where most people backup their data weekly. Good archive solutions do single instance storage and compress files then store them in multiple locations, which also takes some time but reduces storage space and reduces the need to backup the archive.

Next time we'll talk about storage for archive data. Hint: Spinning rust isn't the only option.

Read more about:

2009

About the Author(s)

Howard Marks

Network Computing Blogger

Howard Marks</strong>&nbsp;is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.</p><p>He has been a frequent contributor to <em>Network Computing</em>&nbsp;and&nbsp;<em>InformationWeek</em>&nbsp;since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of&nbsp;<em>Networking Windows</em>&nbsp;and co-author of&nbsp;<em>Windows NT Unleashed</em>&nbsp;(Sams).</p><p>He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders.&nbsp; You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights