Downtime costs vary from industry to industry, based on dependency upon technology and typical labor costs. Companies that are the most dependent upon automated systems, such as energy and telecommunications enterprises, accrue an average of nearly $3 million in losses for every hour of downtime, based on lost revenue and employee idling, according to an October 2000 Meta Group study. IT-dependent manufacturing companies and financial institutions suffer per-hour revenue losses of $1.5 million to $1.6 million. Health care, media and hospitality/travel companies, less dependent upon IT infrastructure, lose between $330,000 and $636,000 of revenue per hour (see "The Cost of Downtime").
But vulnerability is a relative thing. Dollar losses may not be your primary concern if you're in charge of, say, a utility company whose IT outages can leave customers without heat. Protracted outages also can translate into a loss of customer confidence. That's a major vulnerability for even the smallest just-in-time manufacturer or e-business.
If your organization can handle the hefty price, you can attack the problem of business continuity by replicating everything in your production environment at an alternate, company-owned backup facility. You can use high-speed networks and storage and server mirroring to provide instantaneous failover from one site to the other for hiccup-free disaster recovery.
What Do Readers Think?
Check out our e-poll results on disaster recovery.
Most companies, however, can't afford a strategy of full redundancy. For these enterprises, specialized vendors can help replace critical IT infrastructure. Traditional business recovery-service vendors, such as Comdisco Continuity Services, Hewlett-Packard Co. Business Recovery Services, IBM Business Continuity and Recovery Services, and SunGard Recovery Services, make up one part of this market; Web-based data-center providers, such as Exodus Communications and eDeltaCom, represent a new crop. (See "RFI: Storage Disaster-Recovery Services" for a comparative look at many of these services.) The theory is that hammering out recovery logistics before an interruption occurs will speed recovery in the wake of a lightning strike or tornado.
Modern storage-recovery requirements present a problem for conventional business-continuity planning. With the proper logistics, provisions can be made to replace system platforms, networks and even user work areas quickly, but the real key to recovery is time to data. How rapidly data can be restored for use by business applications, decision-makers and customers is the ultimate determinant of successful recovery.
Given this fact, the proliferation both in volume of data and in the type and topology of storage platforms within a single company can create requirements that will make or break the efficacy of all other recovery plans.
Like the potential for disaster, data is growing at an exponential rate in many companies. Conservative estimates from International Data Corp. place data growth at approximately 80 percent per year. From a not-so-measly 184,641 TB of stored data worldwide in 1999, IDC projects that new data storage will grow to almost 2,000,000 TB by 2003.
Much of that growth can be attributed to the Internet, e-mail, and increasingly top-heavy and "media-rich" application software. A significant percentage of data growth can be attributed to data replication -- one side effect of the lack of cost-effective data-sharing technologies. Added to the mix is an abundance of files left over from application-development work, and stale data that its creators use, forget about and never delete from their disks.
In many organizations, real data growth, excluding replication and waste, is somewhat lower than the average cited by analysts, but not by much. Few companies have the time or staff to perform accurate analyses, and automated tools for storage management in a distributed-systems environment are in short supply.
The result is a data deluge that's difficult to segregate into critical and noncritical categories. In the absence of effective classification and management tools to separate data that must be restored immediately from data that can tolerate a lengthier downtime, all data must all be included in the backup process.
Given the sheer volume of data on backup tape and the comparatively slow speed of data-restoration technologies, it's easy to see how storage recovery may lag many hours -- or even days -- behind the restoration of server platforms and networks in a post-disaster scenario.
To make matters worse, recovery-facility vendors say that, while clients' data is increasing by more than 80 percent annually, requests for additional disk-storage capacity in the recovery environment are averaging less than 15 percent growth per year. This apparent gap in storage-recovery requirements may not be discovered, despite periodic plan tests, until it's too late.
Topologies for storage within the corporate IT infrastructure have proliferated. SAS (server-attached storage) is out; networked storage -- including NAS (network-attached storage) and SANs (storage area networks) -- is in. Networked storage solutions will show a robust combined annual growth rate of 67 percent from 1999 through 2003, according to IDC, while the growth rate for storage solutions based on the traditional server-with-attached-storage-array model will decrease by 3 percent during the same period.
Of course, companies rarely mothball older, still-serviceable storage components when they bring new storage components in-house. Thus, the move to NAS and SANs merely increases the number of platforms on which data is stored -- as well as the number of targets to which data must be restored following a disaster.
Networked storage solutions can pose special difficulties that significantly degrade the already-marginal speeds of most tape-based data-restoration solutions. For example, in a SAN, physical disk devices are increasingly "managed" by storage domain servers, storage routers and/or software-based virtualization products that work to deliver virtual volumes to SAN-attached servers. These provide the real value of a SAN: They enable dynamically scalable volumes -- comprising many distributed physical disks and array partitions -- that can be grown or shrunk to meet changing storage demands.
In a storage-restoration situation, these SAN virtualization layers must also act as interpreters, or filters, that direct data streams back to the target disks and partitions that make up the virtual volume where data normally resides. This process introduces several thorny issues related to how data-layout records are maintained, and how the records can be interpreted efficiently by the virtualization products so that data is restored correctly and quickly.
Several storage vendors, including Veritas Software Corp., have launched initiatives to address these issues, but solutions are not yet forthcoming. Ironically, early adopters of SAN technology often cite efficient backup as one of their primary reasons for embracing the topology. Restoration, however, is an important limitation to SAN efficacy -- especially as virtualization approaches come to the fore. With newer SANs (as with many RAID 5 arrays today), data is easy to back up, but difficult and slow to restore.