|
|
|
By Marshall Breeding
Storage for the Network: Designing an Effective Strategy
Backup StrategiesNo storage system can be guaranteed never to fail. We learned that even though most RAID systems can withstand the loss of a single drive, multiple simultaneous failures can result in complete data loss. Even though you have implemented the most failure resistant storage system available, you must also be prepared for both minor and catastrophic failures. Storage systems also fall prey to human failures. Users will inevitably accidentally delete a file or directory and you will be expected to recover it.
Most organizations cannot even to begin to calculate the costs of data loss. If your company trusts its data to your network, then you must guarantee that you can preserve it under all circumstances.
Your storage system must include a solid backup strategy. If your primary storage system fails, then you must be able to repair it and restore all data back to its original state.
The most common backup strategy involves routinely copying data from your primary storage system to a slower, less-expensive medium. While most organizations use magnetic tape, other possibilities include the various optical storage technologies.
Full Backup At selected intervals, you will need to make a full and complete backup of your storage system. This is your starting point if you must rebuild your data from scratch. In cases of catastrophic system failure where all data is lost or corrupted, you may need to reinitialize your storage devices and copy all data from your most recent backup set.
Incremental Backup Storage strategies also usually include incremental backups. It may not be practical to perform a full backup every day. You will want to archive all the files added or changed since the last full backup as frequently as possible. Two approaches are possible here. You might want to catch all the files modified each day in an incremental backup run. If you must rebuild your data, you would put restore the last full backup plus each of the incremental runs. Alternately, you could set your incremental backups to copy all new and modified files since the last full backup. This way restoring the server will take only two tapes--the last full backup plus the last differential. Whether you use daily or cumulative incremental backups will depend on the volume of new and changed data versus the time available for the backup runs. Media costs may be a minor factor--the cumulative approach will consume more space.
Scheduling Issues Schedule your backup jobs to run at the most frequent interval possible. Typically, organizations plan full backups at least once a week and incremental backups at least daily. How fast your data changes and the value of the data also come into play. Some shops will need to perform full backups daily and incremental backups several times per day. If your system is basically inactive for some period of time each night, then scheduling backups is a breeze. But most of our lives are more complicated than that. If your system must be available 24x7, then establishing an effective backup schedule can be quite a challenge.
One of the biggest issues in creating a backup strategy involves whether or not backups can be performed while the storage system is active. If you have a run-of-the-mill NetWare server, for example, and most of the data consists of word processing files, spreadsheets, and the like, then you can most likely perform backups during off-peak hours without problem. Here, open files pose the greatest problem. If a user has a file in active use, the backup system may not be able to open it to make a backup. You would expect the backup system to keep track of which files it failed to archive, retry them at a given interval and report any files missed to the system manager, who would archive them manually. Some backup systems have advanced capabilities for automatically archiving open files.
The more difficult problem for backup strategies concerns transactional databases. Here the entire database may be a single file, but with millions of records. In the simplest case, you would close the database and perform the backup. If this is your strategy, you would want to use the fastest backup hardware possible to minimize the time your system must be inactive. Many organizations require constant availability of their online transaction processing systems. Here, the backup system must be able to more directly interact with your database. In some cases the database application can be placed into read-only, or maintenance, mode so that the backup system can get control of the files long enough to archive them.
Gaps of Vulnerability Disk failures rarely occur immediately after a backup. No matter how frequently you backup your storage system, there will be a gap between the failure and your most recent backup set.
How much data can you really afford to lose? Some operations can risk the potential loss of a day's work, provided that the chances of that actually happening are very small. But many can't afford that level of risk. For organizations with extremely valuable data, you will build as must failure resistance into the primary storage system. The odds of having a data-losing failure in a RAID system are quite small indeed, but possible. You can reduce your gap of potential data loss by performing incremental backups frequently. Incremental backups don't necessarily have to use slow media. One option might involve performing incremental backups to a separate magnetic disk system hourly, and gathering the hourly backups to tape daily. Many online transaction-based systems support journalling. Each transaction not only updates the database, but it also writes the transaction to a log file. If the database must be recreated, you can restore it from the last full backup and replay all the subsequent transactions from the log file.
Managing Backups The safety of your organization's data depends on backup tasks being performed on schedule. The operation of backup tasks requires a high level of discipline. In general, you should rely on software that automatically schedules backup tasks and not depend on network managers or operators to manually perform them. Modern backup software should be expected to automate almost all aspects of your backup strategy.
Device management Expect your backup software to recognize and manage all the devices used in your backup system. If you have multiple tape drives, or tape changers, you should be able to direct the data to the appropriate device and media.
Media management Most backup strategies involve many media units--tapes, platters or cartridges. The backup set might include monthly archives, weekly full backups and daily incrementals. Most backup software systems will keep track of the media involved, tell you how to label each unit and specify whic h one to load. File management Advanced backup systems will track the status of each file on the storage system. These systems maintain a database that includes information on when each file was backed up, what tape it is on, and the like. This information is critical for file restorations. If a user accidentally deletes a file, you should be able to find it in the backup system's database and have it tell you which tape includes the most recent version. If a file becomes corrupted, you may be asked to restore an older version.
Error Reporting Expect the backup system to provide reports on the status of each backup run. Did it run completely? Were there files missed? Were whole server volumes missed? You may want status and error reports printed or sent by e-mail. There may be some errors where you need the system to send a message to your pager or trigger an alarm to an SNMP console.
Error Recovery When problems occur in a backup operation, some can be recovered from automatically. If files are missed on the initial pass of the backup run, set the system to retry them at the end of the run. If the backup system fails to login to a server, it may automatically create a make-up job to run later. Backup Software Platforms Where do you expect to run your backup software? Will it run from a workstation on the network or will it operate on a server? There are some low-end solutions where the backup software runs on a client computer, archiving data from servers to a local tape drive. Most of the more advanced systems, however, run on network servers. Backup management requires a high-performance, multitasking environment, with scheduling capabilities. Servers offer these capabilities better than client computers. You will need to ensure that the backup software you choose is compatible with your server platform. Most of the systems available will operate under NetWare, Windows NT and the various flavors of Unix.
Agents The backup process can be made to operate more efficiently with the cooperation of the target system. It is common for backup systems to use programs called agents which manage the communication between the target system and the backup host. Agents can be used to give the backup host access to workstations on the network so that their local drives can participate in the backup system. You may also need an agent to backup target systems that differ from the platform of your backup host. For example, if your backup host is a NetWare server, you may need an agent on Unix and NT hosts to backup their file systems. Agents can also optimize the performance of backups by taking advantage of the processor on the target system to pump data to the backup server.
Media Options The primary concerns in selecting a backup media for your storage system involve increasing the volume of data that can be transferred to each tape and boosting performance. The more data that can be placed on tapes reduces the number of tape changes that must occur for each backup session. The speed of the data transfer is especially important for those systems that must be offline for backups. Faster backups mean less time offline, which can make a big difference for the organization's data center operations. The vast majority of storage systems rely on tape-based backup systems. The two main competing tape technologies are 8mm Digital Audio Tape (DAT) and Digital Linear Tape (DLT). Through advancements in tape density and compression, 8mm DAT has achieved capacities of up to 24GB per tape and can transfer data at 2.2 MB/Second. Even more recently, Exabyte--the leader in tape backup solutions--has increased its 8mm technology to one that supports 40GB per tape with 6MB/second sustained throughput. Digital Linear Tape has offered capacity and performance advantages over DAT. In its current form, it can sustain 5MB/second data transfer and store up to 35GB per tape. While 8mm technologies have recently surpassed these measurements, we can expect DLT to make its own improvements and sustain its competitiveness.
Optical technologies such as MO or CD-R may suit some environments. For especially high-performance backups, you may want to use another set of magnetic disks, but this would be an extremely expensive alternative. The relative capacities of each of these media constantly increases, and the cost per MB varies. When calculating the costs, be sure to include the hardware costs as well as the media itself.
HSM: Hierarchical Storage managementOne of the fundamental problems for data storage involves the constantly growing size of the data environment. There comes a point when the cumulative volume of data exceeds the hardware's ability to accommodate it. You can deal with the problem by adding capacity to your storage system or coaxing network users to deleted unneeded files. But such efforts cannot necessarily be sustained indefinitely. Organizations with large-scale storage environments should consider a more advanced storage management system.
One approach to controlling data growth involves automatic file grooming. Many of the advanced backup packages include the ability to automatically remove files that have not been accessed after a prescribed interval. The backup system would ensure that any file removed from the storage system would exist on multiple archive tapes. When users need groomed files, they would be manually restored. While this process may somewhat reduce the files on the network, it may leave users uncertain about the status of their files and involves considerable manual work in restoring files.
Hierarchical Storage Management (HSM) deals with the problem of rapid data growth in a more sophisticated way by automatically transferring data files to secondary and tertiary storage systems. HSM operates completely transparently to the user--files continue to appear in directory listings even if they have been moved. The key to HSM lies in migrating the files that are least likely to be n eeded again. Generally, HSM systems assume that the longer a file is idle, the lower the probability it will be accessed again. The secondary and tertiary storage systems can store infrequently used files at a low cost and free up space on the primary storage system for files in active use. Data may be stored online, near online or offline depending on their frequency of use.
Let's consider a typical HSM implementation. Three levels of storage are available: a RAID system of 100 GB, an optical jukebox and a 8-mm tape system. Once the RAID system reaches a certain threshold of capacity, say 90 percent, the HSM software begins scanning the storage system for files that are candidates for migration. We might specify that any file that has not been touched for at least 90 days can be transferred to secondary storage. The files are written to optical platters in the jukebox and deleted from the RAID system. The HSM system replaces the original file with a stub that occupies little space, but maintains the directory entry. As users browse their directories, they are unaware that any files have been deleted. If a user actually needs to use a file that has been migrated, the HSM software intercepts the request and automatically restores the file to primary storage. The user will notice a delay while the optical disk is selected and mounted in the jukebox and the file is copied back to its original location. The delay should be as little as a few seconds, depending on the size of the file. Some HSM systems can use a tertiary storage option. If the migrated data exceeds the capacity of the secondary storage system, then the data might be migrated down another level. In our example, files that had not been accessed on the optical platters for a year might be deleted and archived to magnetic tape and deleted.
The implementation of HSM involves several hardware and software components. On the hardware level, you will need equipment to manage your secondary storage. In most cases this will be an optical disc ch anger or jukebox. One could use a tape drive with an automatic media changer, but only if long file-restoration times are acceptable. The secondary storage device would connect to the same server as the primary storage system. With most HSM implementations, users do not access the secondary storage directly. The HSM software interacts with the secondary storage to place migrated files back to primary storage when needed. This configuration contrasts with a networked optical jukebox where users would access data directly. In most cases a tertiary storage media such as magnetic tape would be used for an additional level of data migration or to perform backups of data on both the primary and secondary storage systems.
The implementation of HSM requires specialized software. Storage management involves a number of complex tasks. The HSM application software must be tightly integrated into the server's operating system. To operate transparently to end users, the HSM software must manipulate the file system and directory structure to make all files appear to be present, even when some reside elsewhere. Some means must be provided to keep track of the location of all migrated files so that they can be efficiently retrieved when needed again. The HSM software will manage capacity issues for both the primary and secondary storage system. All data must be carefully tracked.
Print This Page
E-mail this URL |













