• 07/30/2014
    7:15 AM
  • Rating: 
    0 votes
    Vote up!
    Vote down!

Measuring Cloud Performance Is Like Flying Blind

Legacy systems management software and tools are woefully inept at providing visibility into cloud applications and services. Here's why we need new methods.

Recently, we conducted a study asking IT teams about their current and planned use of cloud apps and services within their organizations. One point really stood out: just 17% of respondents felt like their existing tools do a good job managing and monitoring their cloud-based apps. The rest were at best ambivalent about their existing tools, or felt that they weren't up to the task.

Why? The systems management software market is mature, and includes sophisticated software and tools to manage everything from software distribution, to monitoring, to IT workflow and help desk activities. Why can't these products effectively manage cloud-based apps and services?

There are a few reasons to consider.

Infrastructure ownership and access
Prior to the emergence of cloud computing, IT's management responsibilities didn't extend much beyond the walls of the enterprise, or for larger organizations, the periphery of their corporate area network. Traditional systems management tools were built and optimized with that in mind. Administrators had direct access to network, storage, and compute "boxes" that produced ample amounts of log files or SNMP messages. So their tools only needed to tap into those data feeds, send an alert when something happened, and maybe correlate logs from multiple systems so they could search for and identify trends.

But cloud has changed all that. If you run apps on Amazon Web Services or Azure, you can access app and OS logs, but below the OS you're blind. SaaS apps are completely black box: no log files, no SNMP messages, and most likely not even a management API to interface with. If your tools rely on these mechanisms, you're stuck.

Application management and network operations
It used to be that when users experienced a problem with an application, the IT team had tools to look for reported errors from the application or network. With traditional on-premise apps, the user device, access network, and application likely all reside on the same network, so between apps tools and infrastructure tools, IT could usually find and fix issues fairly quickly.

That's not the case with cloud-based apps. Now the application is a service that must be monitored and maintained. It's built on a complex web of networks, servers, and other services, most of which fall outside the organization's firewall. An admin now must have insight into the health of the application service as well as the networks and services (like ADFS) that are key to delivery of that application service. Neither the apps or ops group has a full view of the service delivery chain anymore; so they go back and forth pointing fingers and guessing haphazardly trying to find the root cause. We call this "chasing ghosts."

The agility mismatch
Traditional systems management solutions support a healthy industry of consultants and systems integrators with official certifications and ISO-9000 compliant project plans. It's not that these tools are poorly engineered; they simply evolved with the sophistication and complexity of enterprise applications and infrastructure management over the past decade.

It's that same complexity explosion that is driving many organizations to the cloud. In fact, nearly 50% of respondents in our survey indicated that agility was a key driver for their move to the cloud.

In the cloud, new apps and services come into the IT portfolio and are updated on a weekly basis. A management tool stack that requires a complex update every 12 months can't keep up.

Performance monitoring in a cloud era
The service delivery chain for cloud apps stretches across many service providers and local data centers and networks, with constant changes in access servers and routes. It's simply not feasible to have a tool that can find and access all the service nodes. So how can IT monitor cloud app performance?

In many cases, web pages or data protocol may be the only way to access these services, requiring synthetic testing of cloud-based apps and services. Looking at trends over time, admins can start to identify other anomalies and cyclical issues. As organizations move to the cloud, it will usher in a new era of IT management.



Tried Amazon custom metrics? Compuware? Cedexis?

Patrick, you can sign up for more performance metrics than are routinely provided by AWS, if you pay a fee for them. I'm curious what they show or don't show, from your point of view? Also Cedexis and Compuware offer metrics derived from synthetic testing for various services, and Compuware will test your application running on Amazon or elsewhere, for a fee. Have you tried them? Interested in how effective some of these services are.

Re: Tried Amazon custom metrics? Compuware? Cedexis?

Your questions highlight the differences between monitoring of apps you host on IaaS (e.g. Amazon Web Service, Azure, etc.) v. monitoring of 3rd party SaaS apps like, Dropbox, or Office 365 (full disclosure up front - my company is working to address these cloud visibility gaps).

If an app I manage is on AWS, their data feeds will give me some useful data, at least on the health of the VM's and their infrastructure.  If it's a custom app, I may want to also use something like New Relic and/or some external synthetic solution like Compuware to give me more insight into my app's performance and user experience.

The challenge for IT teams consuming 3rd party cloud applications, like Salesforce, is that the service level they receive depends on the health of all the infrastructure between the users and the service providers.  Just getting a feed from the application service provider isn't enough.  IT needs to be able to test and monitor the end-to-end service delivery chain.

Superficial Metrics

Patrick, yes you are right. Being in the performance area myself, I have always thought that this was a weak link in the cloud. Traditionally as an enterprise wide performance person, I would be able to get indepth metrics from Web, network, database and OS layers. With the cloud I am restricted to only a few levels of metrics and would have to take an approach on analysing the before and after metrics to get an idea on my application performance.

Re: Superficial Metrics

I think this here is one good reason for maintaining a hybrid public/private hosted solution.  At least for the time being.

Cloud performance

To play the devil's advocate here, isn't part of the point of using cloud services in general that you don't have to monitor the performance -- the cloud provider does that for you, and ramps up the resources when things are slow or congested? If you're using cloud apps, then you would be monitoring them just to ensure you're not getting ripped off by your provider, right? And you can't react if you see a problem with the service anyway, because it's not your infrastructure.

Re: Cloud performance

Hi, Susan.  Those are all common misconceptions organizations have with the cloud.  Yes, the cloud service provider monitors their infrastructure for you but only that.  They don't monitor any of the other networks and servers that sit between your users and them.

The reality is that the major providers all operate their services at nearly %99.999 availability; a far higher level than most customerers operate their environments.  The service level the user realizes is dependent on the health of the end-to-end chain of infstrastructure that connects them to the cloud app servers.  To effectively monitor and manage the service level, IT teams need to look holistically across this chain.

You're right that as an IT admin I can't directly "fix" infrastructure that's outside my organization.  But this is precisely the reason why it's so important to be able to pinpoint the source of service delivery problems.  When there's an issue I have to decide whether to escalate to a) the cloud app provider, b) my ISP, or c) my internal network/ops team.  Picking the wrong path can mean the difference between restoring service in a few minutes v. several hours or even days (if you've ever gone through a support incident cycle with one of these cloud app providers you'll know that's not an exageration). 

Monitoring doesn't go away in the cloud.  It just changes...and becomes more challenging.