Many organizations managing their own infrastructures have tried to create high availability (HA) application environments—in which applications will be accessible no less than 99.99% of the time—by using multiple servers or virtual machines (VMs) configured as a failover cluster. If the cluster node running a mission-critical application goes down, for example, a secondary node in the failover cluster can take over in an instant and pick up where the other node left off.
Such failover clusters typically rely on a storage area network (SAN) for shared data storage. But a shared SAN itself constitutes a single point of failure that can compromise high availability. If the SAN goes down, the SQL Server or Oracle database supporting your mission-critical systems is unavailable, and it doesn't matter how many nodes in the failover cluster might be poised to interact with it.
For organizations considering the cloud for a HA application environment, there’s an even more pressing problem: While some cloud vendors do offer shared storage options, not all options guarantee 99.99% availability.
Does that mean that you need to abandon the cloud as an option for a HA application environment? No: It just means you need to rethink how you configure a failover cluster.
Understanding High Availability in the Cloud
The first thing to understand about the cloud is that it’s very easy to spin up and configure new VMs. In fact, Azure, AWS, and Google all make it easy to create high availability clusters comprised of multiple VMs running in different data centers—also known as zones or availability zones. By configuring your VMs in multiple zones, you eliminate the risk that a zone-wide catastrophe could take down all your critical infrastructure.
But if you read the service level agreements (SLAs) from the major cloud providers, you’ll notice a critical caveat: If you configure your HA cluster with VMs in multiple zones, the SLAs guarantee that you’ll be able to access at least one of those nodes at least 99.99% of the time. It doesn’t guarantee that your application will be operative, only that you’ll be able to access one of the VMs.
That’s a critical distinction that harkens back to the problem with a SAN: it doesn’t matter how many VMs you can access if your applications can’t access your data.
Share Data, not Storage
This brings us back to the issue of rethinking how you configure a failover cluster in the cloud.
If you expect any VM in your cloud-based failover cluster to be able to take over your production workloads in the event of a failure—and that is why you've deployed an HA solution to begin with—you need to configure each VM in your failover cluster with its own storage. Moreover, you need a mechanism that will actively replicate the data in storage on the active cluster node to the secondary nodes. That way, if the active VM goes offline for any reason, the cluster can failover to a secondary VM, which has all the data needed to enable your application to come back online in a matter of seconds.
There are a variety of data replication solutions that can provide the services your organization will need to ensure true application high availability in a cloud-based deployment. Look for synchronous, block-level replication services, to start. Synchronous services will ensure that any transactions written to storage on the primary system will also be written to storage on the secondary systems before the transaction is considered complete. Block-level replication services are also important because they will ensure that any data written to primary storage will be replicated to secondary storage. If your primary cloud infrastructure supports more than one application or if you use that storage as a repository for multiple applications, all that data—not just the data associated with your Oracle or SQL Server database—will be replicated to the secondary infrastructure, where it will be available to any applications or users if the secondary infrastructure is unexpectedly called into service.