Amazon Outage: AWS Lambda Function Issue Causes East Coast Service Outages

(Credit: Stu Gray / Alamy Stock Photo)

An AWS outage having to do with an AWS Lambda function invocation impacted more than 100 services yesterday. The impact was felt across administrative, management, and functional services, including Amazon Relational Database Service, AWS Single Sign-On, AWS Identity and Access Management, AWS Certificate Manager, and more.

When and where did the outage occur?

The incident was first noted around 3 p.m. ET and resolved by 6:30 p.m. ET. It centered in the North Virginia facility and impacted numerous businesses on the East Coast served by this center. According to AWS, "We experienced increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region.”

AWS narrowed down the root cause to be an issue with a subsystem responsible for capacity management for AWS Lambda, which caused errors directly for customers (including those using an API Gateway) and indirectly through other AWS services. Additionally, some users experienced authentication or sign-in errors when using the AWS Management Console or when trying to authenticate through Cognito or IAM STS. (Compounding matters all the more enjoyable, some customers experienced issues when attempting to initiate a Call or Chat to AWS Support.)

By about 4:40 p.m. ET, the underlying issue with the subsystem responsible for AWS Lambda was resolved. It then took several hours to process the backlog of asynchronous Lambda invocations that accumulated during the event.

Pervasive AWS Lambda use made outage extensive

Multiple businesses and organizations, such as the Boston Globe and New York’s MTA, reported problems via Twitter.

Why the far-reaching impact across so many AWS services? Serverless computing, such as that offered by Amazon Lambda, is rising as organizations move to the cloud or modernize their applications by adopting cloud-native architectures.

Specifically, AWS Lambda is a serverless, event-driven compute service that lets enterprises run code for virtually any type of application or backend service without provisioning or managing servers. A company can trigger Lambda from over 200 AWS services and software as a service (SaaS) application and only pay for what they use.

As such, it is widely used. In fact, two in three companies are adopting serverless Lambda functions, according to Steve Dietz, field CTO at Sumo Logic, in an online talk. So, the outage and degraded performance had a double whammy. More companies are using serverless functions, and most of the cloud services they are incorporating into their applications and infrastructure are based on serverless capabilities.

A post-outage analysis

Unlike many of the previous cloud outages of the last year, this incident did not seem to be caused by a configuration error. The cause of some of those past events included a faulty configuration change (related to Border Gateway Protocol) on the backbone routers and a configuration change that impacted a provider's load-balancing systems. And some incidents were power-related.

In this case, it may have been an issue of limited capacity or excessive usage. AWS reported that it was experiencing increased error rates and latencies for multiple AWS Services, with the root cause as an issue with services invoking AWS Lambda.

Related articles:

Navigating the NaaS Era with a Customer-First Approach

Jamie Davies, Customer Experience Director, Epsilon Telecommunications

April 16, 2024

With many enterprises new to NaaS, service providers can differentiate themselves from competitors by prioritizing superior customer service and experiences.

Collaborate to Trust Telco SaaS

Philip Blanchar, Senior Director, Nokia SaaS Business Operations

April 11, 2024

Unlocking the full potential of telecom in the SaaS era relies on comprehensive strategies and a constant commitment to evolving security practices.

The Complexity Cycle: Infrastructure Sprawl is a Killer

Lori MacVittie

April 06, 2024

Eliminating complexity by standardizing with a platform approach is the way to put the brakes on the complexity cycle.

Amazon Outage: AWS Lambda Function Issue Causes East Coast Service Outages

When and where did the outage occur?

Pervasive AWS Lambda use made outage extensive

A post-outage analysis

Tags:

Recommended For You

Navigating the NaaS Era with a Customer-First Approach

Collaborate to Trust Telco SaaS

The Complexity Cycle: Infrastructure Sprawl is a Killer

Search form

Amazon Outage: AWS Lambda Function Issue Causes East Coast Service Outages

When and where did the outage occur?

Pervasive AWS Lambda use made outage extensive

A post-outage analysis

Tags:

Recommended For You

Navigating the NaaS Era with a Customer-First Approach

Collaborate to Trust Telco SaaS

The Complexity Cycle: Infrastructure Sprawl is a Killer