Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

New Options Power Always-On Apps: Page 2 of 2

Virtually Constant
Just as in data center management, server virtualization has enabled administrators to greatly improve availability.' The most obvious impact is that virtualizing standby servers breaks the expensive one-to-one relationship between production and standby servers, thus reducing the cost of providing standby servers.' Because all virtual servers on the same virtualization platform look identical to the guest OS, driver and other related hardware issue are eliminated.

Organizations can also quickly provision servers in data center high-availability systems at the virtual server host. VMware High Availability, Microsoft Clustering of Hyper-V, and Linux failover solutions for Xen all protect guest servers from host failures and allow host maintenance without significant guest downtime. Marathon's EverRun for Hyper-V and Xen can extend
host protection to true multisite disaster recovery as well.

While VMware's Site Recovery Manager (SRM) does require some scripting, it also provides for site-to-site failover of virtual servers across a variety of applications and guest operating systems. SRM relies on storage arrays to replicate the data from site to site and array manufacturers have to write an adapter to enable SRM to manage the replication process.

Another recent trend in application availability has been the development of high-availability and disaster-recovery solutions that are not only application-aware but also operate at the application layer.' A general-purpose solution replicates file- or block-level writes to the primary host's storage to a standby storage system.

Regardless of whether the replication is done using software in the primary and standby hosts or by the storage system itself, the application's database is being managed by the primary host with the standby's copy of the application sitting idly by. When the failover occurs, the application starts on the standby server and mounts its "crash-consistent" copy of the database. (Crash consistent is the industry euphemism for a database that's as consistent as it would be when the server crashes -- or, in plain English, not consistent at all since some number of transactions were assumed to be in the middle of being processed when the server crashed.) Therefore, the first thing the server has to do is a quick consistency check to roll back the transactions that were in progress when the crash occurred. This process usually takes just a minute or two but can occasionally leave the server unavailable to users for several hours as the database is checked and reindexed, especially if the crash occurs in the middle of a database defragmentation.

Application-specific solutions replicate transactional data to a standby server where the running application applies the transaction to its copy of the database. This approach has several advantages. First, because the backup server is running the application, it usually doesn't take long to fail over to the backup, start the application processes, and mount the database. Second,
posting completed transactions prevents many sources of database corruption, such as those caused by malware on the primary host, or storage system I/O errors, from propagating to the backup server.

The secondary server can also be used as a data source for operations like backup, archiving, and reporting, allowing these processes to run anytime without affecting users.

Replicating transactional data also reduces the amount of data that must be sent between primary and secondary data stores. Modern databases write data to transaction log files and then, when the transaction is complete, to the on-disk database. Solutions that replicate storage data must replicate the writes to both the transaction log and database, whereas transaction-based solutions only have to send any given transaction across the line once.

The Essentials
Finding Failover That Fits
Server clusters:
Work best in smaller environments with basic failover needs
Application-specific software: Tailors failover chores to individual apps; good choice when only a few apps are needed 24/7

Application-aware software:
Best for organizations that need fast remediation of failures that are more subtle than complete server crashes

Virtual failover systems:
Make the most sense for larger enterprises with multiple critical apps on many servers

Because each application server performs the task on its own database, this approach also avoids the time-consuming chore of replicating disk writes created when applications perform internal database defrag -- which Exchange 2003 performs nightly -- and other housekeeping.

Many application-specific failover vendors have focused on Exchange, in no small part because it has so many moving parts and interconnections to Active Directory and other network services. Software like Cemaphore's MailShadow OnSite and SonaSoft's SonaSafe for Exchange capture data from the Exchange server using the native Exchange MAPI protocol and transfer it to a running Exchange server; one backup server can provide protection for several source servers and with SonaSafe production servers in different offices can back each other up. To fail over, they run a script that updates the user's mailbox location data in active directory; users then connect to their mailboxes on the standby server.

Teneros' Application Continuity Appliance packages the MAPI data acquisition, failover, and standby Exchange server in an appliance that's positioned inline between the users and the production Exchange server. It will also asynchronously replicate data to an additional appliance at a remote site for disaster-recovery purposes.'

Microsoft has even dipped its toe in the water with disaster recovery features for Exchange 2007: Cluster continuous replication (CCR) for high availability and standby continuous replication (SCR), which ship transaction log files from primary to secondary servers to keep the database up to date. SCR requires significant manual intervention, or scripting, to bring up the standby server, but it shows promise. CCR relies on Windows clustering for failover and is limited to having all the systems on the same subnet.

The downside to application-level failover is that, in the event of a primary server failure, at least a few transactions will be lost in transition, posted to the primary server but not replicated to the standby. So while these solutions improve recovery time, they can also negatively affect recovery points.