The State of Data Backup Protection

Cheap disks and smoking bandwidth have changed the face of backup. In tandem with evolving technologies like de-duplication and more efficient use of VTLs, backup is hot. We explore

June 22, 2007

15 Min Read
Network Computing logo

For most of the past 20 years, making backups has involved a potentially incendiary combination of tedium and little opportunity for reward, plus high career risk if things go south. The only variation to this routine occurs when vendors try to get us excited with new versions of the same backup software or bigger, faster tape drives. Year after year we back up from disk to tape, and when it comes time to restore, we search for the right tapes. Whoopee.

But now, two trends have combined to bring big changes to backup technology. Vastly decreased costs for both disk drives and bandwidth make it worthwhile to reconsider your backup setup. It might be a pain in the neck, but maturing technologies, such as VTLs (Virtual Tape Libraries), and increasingly intelligent backup software have "evolutionized" corporate backup.

The first trend is the free fall in the cost of high-capacity ATA and SATA disk drives and arrays. Keeping a gigabyte of data on disk once cost 10 times as much as storing the same data near-line in a tape library. Since disk costs have fallen faster than tape costs, the difference in now less than 5-to-1.

Not only are SATA drives much less expensive than the Fibre Channel and SCSI drives used to host high-performance applications, the sequential nature of writing backups to disk plays to their strengths. High-performance drives have intelligent-command queuing,shorter settle times and higher rotation speeds that accelerate the kind of random I/O a database performs. But the capacity-optimized SATA drives can handle sequential I/O just as well as their pricier cousins.

The second change is that as quickly as disk space costs have fallen, so have bandwidth costs. The industry frenzy to lay more and more fiber across the world in the 1990s created a bandwidth glut that's made multi-megabit connections affordable even for residential use.

The challenge we face as system managers is to figure out which technologies, like backup to disk, are really sea changes, and which ones, like server-less backup across the SAN, will turn out to be a great idea on paper but a bust in real life.

The most obvious impact of falling disk prices has been the rapid adoption of disk-to-disk backup. Most system managers change from tape to disk as their primary backup medium to speed up their backups. Unless they're still using DLT8000 drives, they soon discover that the speed of the tape drives isn't the limiting factor in how fast most of their backups run.

Backing up to disk lets you back up more data in the same amount of time not so much by accepting data faster, but by letting more backups run in parallel. Unless it's multiplexing, which has its own problems (see "Backup Technologies Past"), a backup application can send only one stream of data to a tape drive at a time. Because tape drives in libraries are expensive, the number of drives available limits the number of backups most organizations can run in parallel.Modern tape drives have a voracious appetite for data. They move tape at 120 inches per second and ingest data at up to 120 MBps. If a backup stream can't keep the drive fed, it must stop the tape, rewind past the point where it left off and start recording again. Not only does this increase the wear and tear on the tape drive, it also slows down the process significantly.

Backing up to disk addresses both these problems. A disk array has no inherent limit on the number of backup streams it can handle and will accept data as fast as the media server can deliver within the limits of the connective fabric. Clever backup admins can schedule a lot of slow backups, like the dreaded Exchange brick-level process, to a single disk target simultaneously.

Although disk is cheap, nothing can beat the cost of tape on the shelf, which can run as little as 10 cents per gigabyte. Experience tells us that frequency of restore requests falls off rapidly over time, so most organizations spool stale backup data from secondary disk to tape for longer retention. Tape also has the advantage of portability, so organizations that don't replicate data to a disaster-recovery site or use an online backup service, can ship their data offsite for disaster recovery.

Aside from vendor pitches and hype, for most organizations the tapeless data center makes as much sense as the paperless bathroom.

Although faster backups may be the sizzle, faster and more reliable restores are the steak of disk-to-disk backup. It's now a given that while we may back up in preparation for a full-server restore, most restore requests are for a few files that a user "lost" in the past 30 days. Even if the tape with the data is still in the library, mounting the tape and fast forwarding to the desired file takes a few minutes. If the tape must be found and mounted, turnaround time can stretch to days. Because even disks emulating tapes in a VTL are random-access devices, there are no mount and fast-forward delays. Files can be restored in seconds instead of minutes.What's the best way to use tape for backup? Backup apps take differing approaches. Some, including Symantec NetBackup and Tivoli Storage Manager, can use disk purely as a cache. The data is temporarily stored to disk until a given backup job is completed, then it's spooled to tape. Others, such as Atempo's Time Navigator and BakBone's NetVault, turn one or more disk volumes into a VTL with a predefined number of tape drives and cartridge slots. Usually the program just writes to a disk, which creates a backup file for one or more backup jobs. Then each backup file is treated like a tape cartridge.

Continue Reading This Story...

RELATED LINKSSonicWall Continuous Data Protection 4440i

Backup Without Wires Online Backup Services Keep Your Data SafeAnalysis: Data De-Duping

IMAGESClick image to view image

NWC REPORTSDownload this article at NWC

RELATED NWC REPORTSReview: Storage Resource Management Suites Of the six suites we tested, our Editor's Choice won for its ease in configuration, support of a broad range of platforms and excellent analytic capabilities.

Backup To Disk Appliances

Ever since Quantum announced its DX-30 VTL in 2003, overworked system administrators have latched onto VTLs as the easy way to integrate disk into their existing backup plans. All they have to do is change the destination for some of their backup jobs to the new VTL. Because the VTL connects to the SAN like a real tape drive and mimics a real tape library, no other disks are needed.

Regardless of how well a VTL does its job, though, it's still emulating a tape library and subject to the limitations of that technology. Once a tape is written it can be appended, but the data on the tape can't be modified or deleted. If a virtual tape contains some successful backups and the data from one or more failed backup jobs, the backup administrator can't delete the data from the failed jobs without overwriting the whole set.

A few years ago we would have said that VTLs are great for overburdened admins, but not for long-term use. As applications added their own backup to disk functions, we expected administrators to redesign their backup processes to save on the added cost of a VTL. We predicted they'd take advantage of the greater flexibility in media management that treating disk as disk provides.If a backup app really understood disk storage, it could delete files at the end of their retention period; let administrators delete the data from temporary, failed or partial backups; and show the amount of available space on the target. Unfortunately, that kind of flexibility is still a pipe dream. None of these tasks are easy with current tools, nor are they possible with tapes--virtual or not.

But the VTL vendors have made progress. The most significant is data de-duplication. With data de-duping, the VTL identifies files, and portions of files, that have been backed up before. Instead of saving an additional copy of that data, it uses a pointer to the previous copy. End users running data de-duping devices report that they can store 10 or 20 times as much data on their backup appliances than the disk capacity of the VTL would suggest. Even with the additional cost of a VTL over raw disk de-duplication, it makes disk backup less expensive than tape in the library.

The list of vendors that now offer this de-duping feature is long. It includes Diligent's ProtecTIER, Quantum's DXi series, Sepaton's Deltastor and FalconStor's latest version of its VTL software. This software is OEM'd by vendors including EMC in its Clariion Disk Library.

The other addition VTLs have made is replication. Once data is de-duplicated, backup appliances from FalconStor and Quantum can replicate the new data across an IP network to another remote backup appliance. For applications that don't require short RPOs (Recovery Point Objectives) this is a cheap way to get data offsite.

Finally, recognizing the media-management advantages of a file-based solution, vendors including Data Domain and Quantum provide a NAS interface to their backup appliances as well as tape library emulation.Synthetic Backups Go Down Market

Organizations using the typical weekly full backup and nightly incremental process have to manage two backup windows. Nowadays many organizations must retain data for longer periods to comply with SOX and HIPAA. Other companies just don't bother to delete old data. As a result, time for incremental backups may remain the same as that allotted for full backups. To solve this problem, specialized backup applications like EMC's Retrospect and Tivoli Storage Manager only make a full backup the first time they protect a server or file system. From that point on, backups are incremental. When a backup administrator wants to restore a file, the application finds the most recent version. If an admin needs to send a full backup offsite for disaster recovery or archiving, the application copies the latest versions of each file from all the backups of the server. It then builds a synthetic full backup set.

While it's possible to do this kind of incremental forever backups with a tape library, building a synthetic full backup or consolidating the backup data to delete old backups from the tapes creates a lot of wear and tear on the tape library.

Using disk as a backup target, even in the form of a VTL, for synthetic full backups is much more practical. As a result, just about every backup program targeted to enterprise or midmarket sites, now includes synthetic full backups as an option. Running incremental backups to a disk target or VTL, then creating synthetic full backups to tape for longer term retention also can reduce the amount of disk space needed to store 30 days of data online by two or three times.

The Rise Of Online BackupNowhere has the combination of falling disk and bandwidth costs had a bigger impact than in the astounding growth of on-line backup services and software. Online backup services have been around since Connected Corp., now part of Iron Mountain, introduced the service in the 1990s. These early offerings were designed to protect data on laptops, not servers. They were also hamstrung by the low-bandwidth connections then available to remote users. Today, standalone users with broadband connections can choose backup services from hundreds of vendors.

Business users also have a range of service providers to choose from, including services that support online backup for both file servers and common applications like Microsoft's Exchange and SQL Server. These services should be especially attractive to SMBs, but there are some issues. Many SMBs have a generally abysmal history of changing tapes and sending them offsite on a regular schedule. There are two significant issues: restore times and security. The fear that an outsider might get access to internal company data is the more significant.

Vendors, including LiveVault (also owned by Iron Mountain) and eVault (a Seagate company), have addressed the restoration issue by offering inexpensive backup appliances that store a local backup copy. This enables restores at LAN speeds.

But security, particularly for SMBs, is a tougher nut to crack. The software used by backup providers encrypts data at the source using an encryption key known only to the user. However, many small-business owners have watched too many episodes of 24, in which secret units break encryption codes as easily as cracking walnuts. In the real world, commercial encryption keeps backup data safe from anyone short of the NSA.

Large companies with remote offices face many of the same backup challenges as small offices. Most remote offices don't have resident IT staff to take care of tasks like changing tapes and sending them off-site. Instead, these fall to office managers and others who may not understand the importance of backing up this data.Backup applications designed specifically for remote office backup, including Asigra's Televaulting, EMC's Avamar and Symantec's NetBackup PureDisk, let a backup administrator at a central data center manage the process for a large number of remote offices. These products extend the kind of data de-duplication that VTLs do across multiple sites. This means the data that has been backed up from one remote office doesn't have to be sent from other offices, greatly reducing the bandwidth needed. Asempra's Business Continuity Server provides similar management features and application-aware CDP.

Encryption Everywhere

In 2002, California passed a law requiring companies to issue a notice when they lose track of people's personal information. Not surprisingly, state legislatures across the country have followed suit. Each time news of a set of misplaced tapes comes out, senior managers turn to their CIOs and say "don't let this happen to us."

As recently as two years ago, encrypting your backup tapes to avoid these problems meant buying dedicated Fibre Channel encryption appliances from Decru or Neoscale Systems. Today you can encrypt your data in your backup application, including EMC NetWorker, IBM Tivoli Storage Manager and Symantec Backup Exec, or at the tape library. Next-generation LTO tape drives, expected to ship by year's end, also include an encryption chip.

It's surprising that many companies, even those unhappy with their backup systems, are reluctant to make changes. But when you consider that the original process of developing a backup system might have been a real pain in the neck, maybe it makes sense. The thought of doing it all over again is just too painful to contemplate.Well, there are a panoply of practical solutions: backup to disk; backup volume reduction technologies including CDP, incremental forever and data de-duplication; and advances in online software for remote and small offices. Any of these can reduce your pain. It will still require careful planning, but remember--change is good.



» Applies to organizations with 50 to 150 or so servers in one data center

» Use low-cost disk array for backup; disk to disk to tape» Create backup jobs so each file holds data from just one server or file system

» Replace tape drives in remote offices with online backup software or service

» Encrypt tapes going off-site in backup application or tape library


» Use backup appliance or virtual tape library with data de-duplication for primary target» Create backup jobs so each virtual tape or file has a single server's data

» Replace tape drives in remote offices with backup application supporting global data de-duplication

» Replicate de-duped data between backup appliances at alternate sites for applications not protected by real-time replication

» Use Fibre Channel encryption appliances to increase performance and to encrypt off-site tapes

Backup Technologies PastMULTIPLEXING

It used to be that there were two key differences between enterprise-class backup applications, like Symantec's NetBackup and EMC's Networker, and workgroup backup products, such as Symantec's Backup Exec. The enterprise products could manage multiple media servers running backups to tape drives from a master server that managed the schedule. They could also interleave, or multiplex, multiple backup streams from different sources onto a tape simultaneously.

Multiplexing keeps the tape drive fed, speeding up backups by running multiple processes in parallel and avoiding shoe- shining. But there's no such thing as a free lunch--multiplexing increases restore times because the restore job must skip the data from servers other than the one it's restoring.

Because a disk backup target can simply write multiple data streams to multiple backup files or virtual tapes, multiplexing doesn't speed up backups. It just complicates media management because each backup file has data from multiple sources. It can't be deleted until all the data in it is no longer needed. Unfortunately most backup application vendors charge license fees on a per-tape-drive basis, so you have to fork out a few more dollars to be able to run 20 backup streams to 20 virtual tape drives than to run them multiplexed to four.

Tape RAIDIn the early '90s, keeping tape drives fed wasn't as big a problem as just backing up servers in a reasonable amount of time. To keep up, backup applications would stripe backup data across several tape drives the same way a RAID array stripes across disks. Of course restoring from a four-tape RAID set means you have to find and mount all four tapes.

Luckily, today we have much faster tape drives and can back up our servers to disk during the limited backup window each night, when the load of running a backup won't effect performance. Then the data can be spooled to tape later when raw performance isn't as important.

Howard Marks a Network Computing contributing editor, is chief scientist at Networks Are Our Lives, a consultancy in Hoboken N.J. Write to him at [email protected].

Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like

More Insights