Maintaining up to date and correct disaster recovery (DR) plans is a constant challenge within an IT environment. Many companies struggle to maintain and validate their DR Plans, decreasing the likelihood of recovery.
By taking a DevOps continuous deployment approach for your applications, you increase the robustness of your DR strategy through increased deployment cadence and a code first practice.
Before looking at how DevOps can help, let’s have a look at some of the challenges commonly associated with DR. It’s important to remember, that a DR plan is more than just an ordered restoration from backup and validation process. Applications may require post-restoration configuration due to site changes, or reinstallation may be necessary with restored data imported after.
DR plans are quickly outdated and invalid
DR plans are most commonly created when an application is deployed into production. This stage is commonly referred to as an Operational Readiness Check (ORC).
Getting a DR plan signed off is only the first challenge and perhaps the easy part. Ongoing DR testing and validation is a complicated and time-consuming operation, which can be potentially disruptive to the live application. However, you can never know if a DR plan is correct unless you test the plan.
Another issue to consider is that after the ORC and production deployment, a DR plan typically does not receive the required updates and gets forgotten. Application updates applied during the lifecycle of the application do not usually trigger a review of the DR plan. Rendering it invalid due to incorrect version specification, which can lead to failure of recovery in the event of DR.
Using the application update as an example, a hotfix could be applied to fix a patch within the database schema. If the DR plan is not updated to include the hotfix, you are likely to encounter a version conflict between the application and database.
Testing is complex and not frequent enough
DR plan testing is the process of running through a DR plan and documenting steps to ensure a successful recovery is possible. While these should replicate a full DR scenario, time, expense, and resource constraints have a significant impact.
A large company I worked with would test by performing a full DR failover to a dedicated DR datacenter. This ensured no interruption to live workloads. Still, it was a massive undertaking, before even attempting the restore.
Small companies that I have worked with didn’t have that resourcing available. The solution was to test DR plans off-site using off-site backups.
Operations teams don’t have experience deploying applications
One way to look at DR is performing an application deployment. After an application is deployed, it remains in that state until it’s time to decommission. When it comes to DR, this is not a good thing because people do not maintain experience in deploying the application.
Application deployments are a learning experience, and by only performing a deployment once that experience gets lost over time. These experiences may change with minor version updates through bug fixes or bug additions.
Incorporation of DevOps practices can help mitigation of out of date DR plans and testing by increasing deployment frequency and code-based configurations.
With a DevOps approach to increasing deployment cadence, mainly when continuous integration / continuous delivery (CICD) pipelines are added to the process. Deployments will occur more frequently in Dev and Test environments compared to Prod. However, the key is that deployment is often practiced; maintaining staff skills and validating the deployment process.
Additional to the frequency of deployment, CICD pipelines validate deployments between environments or sites. The environmental changes between Test and Prod help you to handle the environmental changes that occur when moving to a DR site.
Automated validation checks are used to enable deployment cadence. Deploy, Test and Report. These tests can be incorporated into the DR plan to validate the restoration of the application.
Deployment validation checks are continually updated alongside the application, ensuring that they are up to date. Unlike a static DR plan, they must provide a high level of coverage to ensure deployment faults are detected.
The deployment workflow becomes your new DR plan, but this time it is continually being maintained as part of the application.
The information above is not limited to custom applications. Any service that you programmatically interact with can benefit from using the same principals.