If you hadn’t realized how much of our digital world now lives in the cloud, the Amazon Web Services (AWS) outage last December offered a rude awakening. Within minutes, whole swathes of the Internet—a laundry list of enterprises large and small—found their services severely impaired or knocked offline altogether. The outage itself wasn’t that surprising; even the biggest cloud companies have problems from time to time. What was surprising, however, was the number of companies affected that hadn’t even realized they depended on AWS cloud services.
Modern digital applications have become massively complex, sprawling entities, with bits and pieces coming from dozens of third-party suppliers, often on demand. This means your exposure to service disruptions is no longer limited to where you host your own digital assets. You now have to worry about where all those digital partners host their assets too. And in too many cases, companies are not doing what they need to do to safeguard their business from this tightly interconnected chain of dependencies.
Even if you weren’t personally affected by the recent outage, let’s use this as a learning experience for our industry. The digital world is very different from what it was a decade ago, and our monitoring strategies need to evolve. The number one action item: make sure that the things you’re monitoring, and the services you’re using to monitor them, don’t all live in the same cloud.
A new spin on an old problem
Businesses first confronted closed-loop monitoring interdependencies back in the 90s, when most web assets lived on-premises. If your monitoring tools live in the same system you’re monitoring, and that system goes down, how do you know you have a problem? The answer, of course, is you don’t. You’re effectively flying blind.
That’s why monitoring solutions were among the first to become software-as-a-service offerings—even before the term “SaaS” existed. As the monitoring industry has grown over the past 20 years, most of that growth has come in SaaS. But in recent years, another big digital trend has dominated: the move to the cloud. And while cloud has been hugely beneficial for thousands of companies, it’s also created new problems. Or rather, it’s resurfaced old problems we haven’t had to worry about for a while.
When web applications get more interconnected and interdependent, and everything moves to the cloud, we’re basically back where we were two decades ago. That is, if my critical infrastructure lives in one of the big public clouds, and my monitoring provider lives there too, what happens if that cloud goes down? Well, many businesses just found out.
Building a better monitoring strategy
In today’s tightly interconnected digital landscape, you need to monitor everything you outsource and continually verify that your vendors are meeting the service-level agreements (SLAs) they’ve committed to. Critical to doing that, you need to ensure that the tools you’re using to monitor those third parties and SLAs do not live in the same place as the vendors you’re monitoring.
How can you protect your business from getting blindsided by the next big cloud outage? Start with these guidelines:
- Thoroughly evaluate your digital assets for closed-loop interdependencies: It sounds obvious, but the most important thing you can do is just make sure that, whatever tools you’re using to monitor your business-critical services, they’re not hosted in the same location. That’s not necessarily simple when practically every company now runs in the cloud. Even a multi-cloud strategy won’t necessarily help you here since that means you need trustworthy data about vendors in multiple clouds instead of just one. In many cases, your best bet may be to look to monitoring providers that hyperscale cloud companies use themselves—providers that don’t run from the cloud.
- Identify single points of failure: Maintain a comprehensive inventory of every vendor that touches your digital services and where their services are hosted. Who handles your Domain Name Services (DNS)? Your Content Delivery Network (CDN)? If you’re in eCommerce, which third parties are involved in the checkout process—from local tax to address lookups and more? Make sure you’re observing the end-to-end delivery chain and that you know which third-party components will shut you down if they go down, which will slow your services if they get slow. Try to implement redundancy around critical services wherever possible.
- Watch out for new blind spots when changing vendors: The mix of digital vendors you rely on, and the specific clouds where those services are hosted, can change all the time. So, this evaluation needs to be an ongoing effort. Be particularly mindful when making broad organizational changes. For example, in recent years, as the number of vendors companies work with has exploded, many CIOs have been given mandates to consolidate. On paper, it might seem straightforward to just cut some vendors. But if you’re not careful, it can be all too easy to deviate from your monitoring strategy—and introduce new blind spots when you do.
- Make sure you’re actually monitoring SLAs: Most IT departments now use SLAs for their own services and require SLAs from their vendors. Indeed, the thorniest parts of vendor negotiations often stem from demands for strict SLA terms. And yet, despite their legal departments battling to get those terms in the contract, many companies don't even monitor their vendor SLAs.
Even major outages won’t drive businesses to pull back from the cloud. We’re all far too dependent on modern cloud services and the benefits they provide. But as an industry, we need to be cognizant of just how interdependent those services have made modern digital businesses—and how easy it can be to introduce monitoring blind spots as a result.
A well-crafted monitoring strategy won’t necessarily shield you from being affected by the next major cloud outage. But, by keeping a close eye on your cloud-based vendors—and doing it with tools that don’t live in that cloud—you can make sure you always have a clear picture of what’s happening and the information you need to respond.
Dritan Suljoti is Chief Product Officer and Co-Founder at Catchpoint.