Major AWS Outages and 6 Tips for Surviving One

Find out the 4 business impacts of outages, common causes, recent examples, and how to stay operational during an AWS outage.
Share post:

What Is an AWS Outage? 

An AWS outage refers to a significant disruption in the availability or performance of Amazon Web Services (AWS), one of the world’s largest cloud service providers. During an outage, AWS customers may lose access to their infrastructure, data, or services hosted on the platform. 

These disruptions can be localized, impacting specific AWS regions or availability zones, or widespread, affecting multiple global regions simultaneously, depending on the root cause and the architecture of the affected systems.

Outages can range from brief, minor incidents to major, hours-long disruptions with broad implications for businesses and users worldwide. Given AWS’s role in powering a wide variety of applications, from streaming media and ecommerce to enterprise productivity tools, the consequences can be immediate and severe. 

Understanding and preparing for AWS outages is crucial for organizations building or operating critical workloads on the platform. Even when outages are not catastrophic, they can lead to disruption and damage to mission critical applications.

This is part of a series of articles about AWS backup

In this article:

4 Business Impacts of AWS Outages 

Here are some of the possible impacts of AWS outages:

  1. Operational downtime: During an AWS outage, operational downtime can be a significant issue for businesses relying on AWS for their critical systems. Depending on the services affected, organizations may experience disruptions in their ability to access or process data, communicate internally, or serve their customers.
  2. Revenue loss: For businesses that rely on AWS-hosted applications or services for customer-facing products, an outage can lead to direct revenue loss. eCommerce platforms, for example, may be unable to process transactions, leading to missed sales opportunities.
  3. Compliance implications (data integrity): Many businesses rely on AWS to store sensitive customer data, which can include personally identifiable information (PII), financial records, and other compliance-regulated data. In the event of an AWS outage, organizations may not be able to ensure the availability or integrity of their data, potentially violating industry regulations or contractual obligations.
  4. Reputational damage: Customers and users expect reliable, always-on access to services, and a significant disruption can erode trust in a company’s ability to deliver. Businesses that are unable to resolve outages quickly or communicate effectively with users may face a long-lasting negative impact on their brand image.

Common Causes of AWS Outages

Connectivity and Network Disruptions

Network disruptions are a frequent cause of AWS outages. Issues may stem from internet backbone connectivity failures, distributed denial of service (DDoS) attacks, or routing misconfigurations within AWS’s internal networks. Even minor network instability can cascade, impacting cloud-based services dependent on uninterrupted data transmission.

These disruptions can isolate resources, prevent access to critical APIs, or even partition regions, affecting latency-sensitive and interconnected cloud applications. Organizations can find themselves temporarily disconnected from vital systems, with no immediate recourse but to wait for AWS engineers to diagnose and resolve root causes.

Misconfigurations and Human Error

Operational mistakes by AWS engineers have been a recurring cause of major service disruptions. Examples include incorrect updates to access controls, routing tables, or automation scripts that govern large parts of the AWS infrastructure. A single misapplied change, when propagated across multiple regions or availability zones, can trigger cascading failures affecting core services like S3 or EC2.

These incidents often stem from insufficient safeguards in deployment pipelines or gaps in change management processes. Even with automation in place, manual interventions during emergency maintenance or configuration updates can introduce errors at scale.

Power and Hardware Failures

Physical hardware failures—such as the breakdown of servers, storage arrays, or network switches—can trigger partial or complete outages in AWS’s extensive data centers. While AWS maintains redundancy and automated failover mechanisms, massive hardware issues may overwhelm these safeguards, causing interruptions in availability or performance.

In some cases, hardware failures occur alongside environmental factors, such as power outages or cooling failures, further compounding recovery efforts. Recovery requires coordinated troubleshooting, device replacement, and sometimes the migration of workloads to alternative hardware, which may not always be instantaneous.

Software Bugs and Patches

Software bugs—either in AWS platform components or third-party integrations—have also caused outages. Faults introduced by new code deployments, firmware updates, or infrastructure patches can propagate quickly across cloud environments, impacting thousands of customers. These incidents may manifest as degraded performance, data loss, or total service unavailability.

Another risk is that patches intended to resolve other issues might inadvertently create new vulnerabilities or incompatibilities. Tight release cadences and insufficient real-world testing further increase the likelihood of bugs escaping detection before reaching production environments, emphasizing the importance of staged rollouts and robust monitoring.

Control Plane Outages

Not all AWS outages involve broken hardware or failed networks—sometimes it’s the control plane that fails. When the AWS API or Management Console is unavailable, you may not be able to start new EC2 instances, modify security groups, restore EBS snapshots, or update Route 53 records, even if your underlying data is perfectly healthy.

If your backups live only in AWS, this can leave you unable to restore until the control plane is back online. The best mitigation? Store copies of your backups in another cloud (like Azure or Wasabi) and back up your N2W server there too. This “escape hatch” lets you boot N2W in a different cloud and restore workloads—complete with networking, IAM, and DNS—without touching AWS APIs.

Pro tip: Back up your N2W server to another cloud, so if AWS’s control plane is impacted, you can boot N2W elsewhere and restore instantly. That’s your “full escape hatch” when AWS status pages are still glowing red.

External Dependency Failures

Sometimes AWS isn’t the weak point at all. Outages can be triggered by failures in third-party services that your AWS-hosted workloads depend on—such as DNS providers, identity management platforms, or even expired SSL/TLS certificates.

For example, a certificate expiration in an external identity provider could block all user logins to your application, even though your AWS infrastructure is fully operational. To reduce this risk, monitor certificate lifecycles, avoid hard dependencies on a single external service, and ensure your architecture fails gracefully when an upstream provider stumbles.

Tips from the Expert
Picture of Jessica Eisenberg
Jessica Eisenberg
Jessica is Senior Global Campaigns Manager at N2WS with more than 10 years of experience. She enjoys very spicy foods, lifting heavy things and cold snowy mountains (even though she lives near the arid desert).

Examples of Major AWS Outages 

February 2025: Networking failure in eu-north-1 disrupts multiple services

On February 13, 2025, a regional networking fault in AWS’s eu-north-1 (Stockholm) region caused widespread service degradation. The incident originated in Availability Zone eun1-az3 and affected intra-region traffic, leading to latency spikes and elevated error rates across services such as EC2, S3, Lambda, DynamoDB, and CloudWatch.

Although traffic into and out of the region remained stable, internal service-to-service communication was disrupted. This impacted core platform services and serverless operations. Recovery began within an hour, with full restoration achieved by 04:05 UTC on February 14. The event highlighted how localized internal network issues can cascade through dependent cloud services, even without external connectivity loss.

June 2023: Regional outage in us-east-1 impacts high-profile customers

On June 13, 2023, AWS services in us-east-1 suffered elevated error rates and slow response times. The outage affected many services including Lambda, IAM, SQS, and EventBridge. Amazon Connect experienced extensive failures, with broken chat sessions and agent login problems.

While partial recovery began within a few hours, backlogs and delayed processing extended the full resolution timeline. The incident affected customers like The Boston Globe and the New York MTA, reinforcing the operational risks of relying on a single high-traffic region like us-east-1.

July 2022: Power loss in Ohio AZ1 causes EC2 disruptions

A power failure on July 28, 2022, in Availability Zone 1 of the us-east-2 (Ohio) region led to degraded EC2 performance and extended recovery times. Although the power interruption lasted just 20 minutes, some services took several hours to stabilize.

Third-party applications such as Webex, Okta, and Splunk experienced authentication issues, API failures, and connectivity problems. This incident revealed how physical infrastructure issues can propagate across software layers and external applications.

September 2021: EBS storage issue causes 8-hour outage in us-east-1

On September 26, 2021, a stuck I/O issue in Amazon EBS triggered a prolonged outage in us-east-1. EC2 instances became impaired, and new launches failed across affected zones. Since many AWS services rely on EBS for persistent storage, the issue disrupted Redshift, ElastiCache, and RDS as well.

The eight-hour incident demonstrated the vulnerability of shared storage systems and the critical need for resilient architecture in dependent services.

How to Stay Operational During an AWS Outage

In the event of an AWS outage, maintaining business continuity is crucial. While AWS offers a highly reliable infrastructure, outages can still occur, and organizations must be prepared. This section outlines strategies to help businesses stay operational during an AWS disruption, including multi-cloud approaches, disaster recovery plans, and effective communication with customers.

1. Implement Multi-Cloud Strategies

One of the most effective ways to remain operational during an AWS outage is by using a multi-cloud strategy. By distributing critical workloads across multiple cloud providers—such as Google Cloud Platform (GCP) or Microsoft Azure—you can avoid total dependence on a single platform. 

A multi-cloud approach allows organizations to shift workloads to another cloud provider in the event of an outage, ensuring continuity even when AWS experiences issues. While this introduces complexity, especially in managing cross-cloud infrastructure, it greatly improves fault tolerance.

2. Leverage Hybrid Cloud and Air-Gapped Disaster Recovery (DR)

Hybrid cloud setups combine on-premises infrastructure with AWS cloud services. Organizations that maintain certain critical systems on-premises or in private clouds can continue operations when AWS services are disrupted. 

For example, in a scenario where AWS faces an extended outage, an air-gapped DR setup—where data and applications are stored on completely isolated, offline systems—provides a secure and independent backup. This ensures data integrity and access during prolonged AWS outages, but it requires careful planning to maintain up-to-date backups and rapid failover capabilities.

Learn more in our detailed guide to AWS disaster recovery

3. Use Caching and Offline Functionality

 When relying on AWS services, building local caches for critical data can allow some applications to function during an outage. For example, services like Amazon CloudFront can cache frequently accessed content, while local storage systems can hold key data for offline access. 

For applications that rely on AWS for real-time processing, incorporating a local buffer or queuing mechanism (such as AWS SQS or Kafka) can ensure that data isn’t lost during downtime and is processed once the cloud service is back online.

4. Rely on Sufficient Failover and Redundancy

 In the event of an AWS outage, failover mechanisms can help minimize disruptions. By designing systems with failover to backup regions or availability zones, workloads can be redirected from the affected zone to a healthy one. 

For example, AWS services like Route 53 can automatically reroute traffic to alternative regions during service interruptions. Ensuring adequate redundancy in application components and data storage systems allows for seamless recovery without significant service disruption.

5. Continuously Monitor AWS Health and Alerts

AWS provides a health dashboard that offers real-time updates on service statuses across regions. Setting up alerts based on these updates, or using third-party monitoring solutions like Datadog or New Relic, can help businesses quickly detect when issues arise. 

Early detection allows organizations to take preemptive steps, such as redirecting traffic or activating failover systems, to reduce the impact of the outage.

6. Have Contingency Plans for Customer Communication

In the event of an AWS outage, customer trust can be eroded if communication is delayed or unclear. Organizations should have predefined communication strategies in place, ensuring timely updates about the status of services and steps being taken to restore functionality. Transparency is key—informing customers of expected downtimes and offering alternatives or compensation when appropriate can help mitigate reputational damage.

Related content: Read our guide to AWS backup vault

Mitigating the Impact of AWS Outages with N2W Disaster Recovery

When AWS goes dark, your recovery speed depends on two things:

  1. How quickly you can access your backups.
  2. Whether you can restore them without depending on AWS’s control plane.

N2W gives you both, with:

  • Cross-cloud disaster recovery: Instantly restore AWS workloads into Azure if AWS is down—no waiting for us-east-1 to recover.
  • Full environment rebuilds: Bring back EC2 instances, networking configs, IAM roles, DNS records—everything your workloads need to actually run.
  • Immutable backups: Your data is untouchable, even during a ransomware attack or accidental deletion.
  • Automated DR drills: Test failovers regularly so you know your recovery plan works before you need it.
  • Cost-smart archiving: Keep backups in low-cost storage and save money in the process. (Because we archive immutable snapshots—not full ones like AWS does—and we let you choose the storage tier you want, so you can actually save money).

📘 Want the full playbook?

Download the Cloud Outage Survival Guide and see how to keep your business running through the next AWS outage.

You might also like

the disaster-proof backup & DR checklist

What your backup plan is missing...

Fortify your backup plan across every critical dimension with this checklist.