• 04/22/2011
    11:01 AM
  • Rating: 
    0 votes
    Vote up!
    Vote down!

Amazon's Cloud May Seem Magical, But It Isn't

By now you have heard that Amazon Web Services had a massive disruption yesterday, affecting Elastic Cloud Computing (EC2) instances in the company's northern Virginia data center. The disruption was/is long-lived (Amazon's dashboard is still showing problems), and certainly blew any claims for an annual uptime of 99.9 percent, which is 8.76 hours downtime per year. In fact, it likely blew 99.8 percent uptime, which is 17.52 hours of downtime. While 99.8 percent sounds good, the fact that some s

Of course, even the best-laid plans can be waylaid by unforeseen consequences, which is likely what happened to Amazon. I don't believe the company would have designed in such a catastrophic failure point on purpose.

What are the options? Stay out of cloud computing? Maybe. Cloud computing, with its automated management plane, is still young. A lot of smart thinking has taken place in the operations and management of big, automated computing services, but there is more to come. Cloud computing is still cutting edge and needs maturity. Of course, I get a chuckle from the idea that cloud availability issues can be solved by using multiple cloud providers via an automated method.

While I haven't given that idea a great deal of thought, I'd need to see some serious proof points to start to believe it. Adding more clouds doesn't necessarily mean additive availability. Building a computing system with products that are reliant on each other and offer five nines reliability actually reduces the statistical uptime because the reliability is multiplicative, not additive. Adding more clouds won't magically make your services more reliable. What you need to do, if you are planning on using cloud services, is to examine the applications you want to put in the cloud and consider how they can be redesigned for resilience. Your application--as a system that includes hardware, software, services, etc.--has to be designed to recover from failure. As George Reese, CTO of Enstratus, said on Twitter, "When you put the responsibility for availability on software, your hardware options increase and your costs go down. And, ultimately, you get greater availability."

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.

Log in or Register to post comments