Over the last two years, some of the most significant outages affected major infrastructure and service providers. Internet giants such as Akamai, Amazon, Facebook, Fastly, and Salesforce all experienced failures. If there’s one lesson these outages teach us, it’s that the Internet is more interconnected and fragile than ever. We’ve reached a tipping point where a minor disruption to a single provider or component can have a major domino effect on the wider web.
The Butterfly Effect of Outages
Take Meta’s well-publicized outage of 2021, for example, when Facebook, Instagram, WhatsApp, Messenger, and Oculus VR experienced simultaneous and prolonged downtime worldwide. What quickly became apparent was that the outage was impacting the page load time of many popular websites that were not powered by Facebook. Why? Because Facebook's ads and marketing tags are present on almost every major website.
Similarly, when AWS went down a few months after the Meta outage, it took down major online services such as Amazon, Amazon Prime, Amazon Alexa, Venmo, Disney+, Instacart, Roku, Kindle, and multiple online gaming sites. While no outage in 2022 eclipsed Meta’s 2021 incident, the year continued the trend of worldwide outages, notably amongst social media companies, which made up 70% of the largest outages.
Amazon’s recent search failures showed that an outage doesn’t even need to take down the entire website to damage it. For 22 hours, around 20% of worldwide users were impacted, with a subset of users unable to use the search function completely the entire time. Can you imagine the impact on global revenue?
Why Internet resilience is a top priority
So, what does this all mean for your company? In a post-pandemic landscape where every business is now effectively an online business, slow is the new down, and the ripple effect of negative user experiences can be far-reaching. Building Internet resilience has become a top priority at the board level for all these reasons, plus three additional ones:
1. Greater complexity
Not so long ago, gaining visibility into the IT systems that fueled your company was easier - everything was on the local area network (LAN). Today, everything from customer relationship management systems to email servers is being run on the Internet, which, as we’ve seen, is far from magically resilient. As your Internet stack grows in complexity, finding and responding to issues quickly will become even more challenging.
2. Everything is distributed
The COVID-19 pandemic accelerated the shift towards distributed architectures, where even our workforces are now distributed. With millions of hybrid workers continuing to work on home networks that were not designed for constant high bandwidth use, it’s no coincidence that there’s been an increase in major providers going down.
3. Your IT team has less direct control
Rather than being responsible for maintaining and managing in-house software systems, your IT team is now at the mercy of third-party networks and providers because of the hypergrowth of SaaS apps. This loss of control over your organization’s mission-critical applications and services makes it more complicated to find the origin of an issue when things inevitably go wrong.
The Role of Internet Performance Monitoring in Ensuring Internet Resilience
As traditional monitoring methods become inadequate, a growing number of companies are augmenting their monitoring capabilities with Internet Performance Monitoring (IPM), built from the ground up to provide comprehensive visibility into the enterprise Internet stack. IPM will become an essential tool for organizations that rely heavily on the Internet as their network infrastructure, which, after all, is every business out there.
APM vs. IPM: Understanding the Differences
While APM and IPM are both observability solutions, they have distinct areas of emphasis. APM tools focus on optimizing the performance of an application stack by analyzing everything that impacts its performance, including tracing, discovery, and diagnostics. IPM, on the other hand, focuses on the Internet stack analyzing key Internet performance metrics to provide insights on the customer, workforce, and application (or API) experience across the Internet.
APM and IPM also differ in their use of synthetic monitoring, RUM, and profiling tools. Take APM tools, for example, with synthetic agents in the cloud. And yet customers and employees don't access systems from the cloud; instead, they connect from various devices and locations such as their homes, workplaces, or public places via their internet service providers.
Why IPM is Crucial to Your Digital Resilience Strategy
In contrast, IPM captures performance from thousands of monitoring vantage points globally, providing a comprehensive view of every layer of the Internet, from ISPs and wireless carriers to BGP ASNs. So rather than competing with APM, IPM's ability to provide deep end-to-end visibility of the Internet makes it a crucial component of a company's digital resilience strategy.
Will the Internet fully collapse in 2023? Doubtful, but it’s not a matter of if but when Internet outages will occur. What's clear is that no service is too big to fail, and no business is too small to be unaffected. The need for resilience has never been more critical to your business.
Howard Beader is Vice President of Product Marketing at Catchpoint.