Amazon Web Services continues to dominate the cloud and IaaS in both market share and scale. With 41.5% of cloud application workloads hosted within AWS, Amazon is undoubtedly the most preferred cloud service provider in absolute terms. As with any application or service that moves to the cloud, using AWS requires you to adjust the approach for operational monitoring and performance management. Whether or not you literally “own” the infrastructure, application or networks that deliver your apps and services, as an IT leader or technology professional you own the outcome. Here are five factors that you should consider to deliver an optimal user experience while using AWS.
1. Gauge end user experience
Irrespective of which cloud provider your services are hosted on, you need to track the application performance from the perspective of the end user. The nature of the cloud means that these critical services and applications are delivered primarily over the internet, a public, “best-effort” network. The challenge is that for many IT infrastructure and network professionals, the toolsets they have access to can be very localized and specific to a particular aspect of the infrastructure or the application.
For example, Amazon CloudWatch can provide visibility into the performance of workloads hosted within a VPC, but it lacks the perspective of how these services are delivered to the end user. Consider continuously monitoring your AWS-hosted application performance with metrics such as HTTP response times and HTTP page load times from all key locations (on- or off-premises) that your users or customers will access them from. Without a finger on the pulse of the application and user experience level, it can be maddeningly difficult to figure out whether a “red light” on a console correlates to something that users actually care about.
2. Understand AWS inter-service dependencies
Earlier this year, a human-induced error brought down AWS’ S3 cloud object storage service. In addition to S3 suffering an outage, many other AWS services that depend on S3 -- Elastic Load Balancers, Redshift data warehouse, Regional Database Service -- also had limited to no functionality. Though the S3 service is often invoked on the backend and not readily apparent to end users, the outage revealed the many service dependencies on S3 and exposed a critical lack of cloud storage redundancy. While relying on AWS to host your critical services, enterprises need to keep in mind the dependencies on both internal and external services.
3. Evaluate AWS architectural dependencies
Enterprises that take advantage of multiple regions within AWS to distribute services for fault tolerance or load balancing should take into consideration the impact of inter-region latencies on application performance. Inter-region transit is entirely within the AWS network and does not traverse any third-party service providers. Network latency can vary from 20to 200 milliseconds between regions, depending on geography. For example, the latency between US-West and US-East is significantly lower than the latency between US-West and AsiaPac-Sydney.
As Amazon continues to make investments in new infrastructure, inter-region performance will likely improve in the future. While you baseline inter-region performance, do not lose sight of inter-Availability Zone (AZ) performance as well. Inter-AZ latency within a region is designed to have low latency, but tends to vary depending on the region. For example, inter-AZ latencies in US-East can be fairly consistent at 1-2ms, but inter-AZ latencies in EU-London can average at 5ms, occasionally spiking up to 15ms. Bear in mind that Amazon maps AZs independently for each account, so your US-East-1a might not be the same location as someone else’s US-East-1a.
4. Watch out for ISP outages
To state the painfully obvious, if the internet isn’t working properly, then your AWS user experience will suffer. The more distributed your workforce or user base is, the more internet communication paths play a part. While traffic between regions is entirely controlled by AWS, traffic from end users destined to your hosted service enters the AWS network closest to the region in which the service is hosted. For example, if you have customers in India accessing services hosted in AWS US-West2, traffic from India will enter the AWS network at a peering point closest to the US-West2 data center in Oregon.
With an architecture like this, there is an increased dependency on multiple ISPs that directly impact application performance. That means a Tier1 ISP outage, like the recent Level3 route leak, can impact your end users. Keeping a watchful eye for ISP outages can provide IT teams the right data set to plan for remediation, when possible. Also, understanding when service issues originate from AWS infrastructure or a public ISP leads to faster resolution and more uptime.
5. Don’t forget about DNS
I recently heard a talk from a network manager at a large media company who has been measuring cloud-related performance and monitoring issues for a few years; his No. 1 piece of advice was to not forget DNS. His team found that DNS was a frequent culprit in terms of slowing down overall performance of internet-accessed applications. Whether it's Route53 or any other third-party DNS service provider, keep track of DNS resolution times and DNS record mappings.
When you move workloads or applications to the cloud, you enter a “new normal” zone for IT operations. You no longer own all the infrastructure or networks, so the types of dependencies you need to understand and monitor changes. This means new baselines need to be determined and monitored for deviations. The good news is that armed with awareness and sound monitoring, you can master the operational realities of the cloud and deliver predictable digital experiences for your users.
Archana Kesavan is a Sr. Network Analyst at ThousandEyes, a network intelligence company that delivers visibility into every network, where she helps bring light to the world of network monitoring. Previously, she spent 10 years in various roles including technical marketing, product management and solution testing. She is a trained classical dancer, loves dogs, and obsesses over rainy days.