Despite A Claim of Five-Nines Uptime, A 48 Hour Outage Blew The Curve
July 09, 2010
As an organization that prides itself on working smarter, not harder, we have been an early adopter of cloud computing. This has meant learning the hard way that just because an application is "in the cloud" does not mean it is more scalable, more reliable, or that it performs any better than if you ran it on a server down the hall. If you are considering any type of "cloud" variation, you had better take the time to address much more than just the technology you are trying to offload from your data center.
My own experience has been very bad in this area. Recently we contacted our cloud-based Microsoft Exchange provider, Intermedia.net, to let them know that our employees could not access their e-mail through Outlook. Two days later, we finally had Level Two support on the phone with us, only to learn that they had discovered that a corrupted iPhone Active Sync account was causing their main servers to be overloaded, and this was causing the problem. Who were these users with corrupt Active Sync profiles? They did not work for us! It was a completely different company. Their users caused our service to crash and it took Intermedia tech support two days to locate the problem. So much for the cloud being better.
As an analytics company, we wanted to know why it had taken so long to track down the problem. Were they not monitoring the servers for resource exhaustion? "We monitor everything 24 hours a day," was the response we got. Either they are not telling the truth about monitoring, or they don't realize that a server at 100 percent CPU utilization for two days can be a problem. I informed them that they needed to remove the 99.999 percent uptime claim from their website, as they had blown that out of the water in just two days in June. As of now, the claim is still there.
When you evaluate any cloud solution, you should ignore any claims of uptime unless the provider is willing to share actual metrics from their service tracking. Don't simply rely on feel good statements like "we monitor everything 24 hours a day." Push the providers to show you what that means. Get detailed information and a concrete SLA for the applications, not just the service. For example, if your users use Outlook, then have the SLA reflect availability of MAPI, HTTPS and SMTP for email. If they are unwilling to do this, find another provider.
If you have any doubts, push for the details. Most cloud providers, like most enterprises, confuse the ability to have actionable data with actually having actionable data. When pressed to provide a report on HTTPS service availability, for example, you may find that their tales of sophisticated NOCs and SLAs evaporates. If they can't provide a report that shows how available a service is over time, then it's likely they aren't monitoring it to see if it is available.