Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Continuity Software Sets The Bar High For Disaster Recovery Testing

After a major IT operational problem or even a disaster, getting applications working again with the data they need is vital to any public, private or government enterprise. Yet how many enterprises could honestly raise their right hands in response to the question, are you very confident in your business continuity plan? Those enterprises that use Continuity Software's RecoverGuard solution (and maybe a few other enterprise solutions) could honestly answer yes.

Enterprises that do not adequately protect themselves are running a high risk financially. Even though the probability of a disaster is relatively low, the costs associated with disaster can be astronomically high, so the expected value--multiplying the expected probability by the expected cost--is likely to be substantial. Yet, even if the expected value is not sky high, enterprises cannot take a bet that places their business at risk. Auditors should mandate effective disaster recovery planning, not only for risk management, but also for compliance reasons.

Still, even IT organizations with the monetary and people resources to perform high-availability (HA) and disaster recovery (DR) testing face numerous challenges. Testing tends to be very intensive and manual and can also be disruptive to online applications. Even annual or quarterly testing is likely to produce disturbing results as failure rates are likely to be high. Continuity Software reports a 75 percent testing failure rate as recovery configurations are "out of sync" with their production configurations. That sounds high, but I suspect that it is fairly accurate.

The reason for this is simple: IT infrastructures are very complicated. To implement high-availability (HA) at a local operational site requires necessary redundancies through technologies, such as RAID, SAN multi-pathing and clustering. Then for disaster recover (DR), geographical redundancy requires a remote site that has to have the necessary replication and fail-over capabilities.

That sounds fine, and on day one of getting everything to work properly, it probably is. However, IT solution architectures are dynamic, not static. The production environment at the local site constantly changes, such as for new applications, the need to re-provision storage or the creation of virtual machines (VMs) on physical servers. These constant production environment changes are manually applied to both the local production HA systems, as well as to the remote DR systems.

  • 1