When trying to build a reliable network, you must plan and design for inevitable system errors. Even with today’s advances in technology, failures still occur in hardware and/or software. To handle these challenges, Amazon introduced a solution in January that continues to show the robustness of their cloud. Amazon’s feature for EC2 instances allows for auto recovery, which helps identify failures in an instance and performs automatic recovery actions using a pre-saved instance of the system. Let’s take a look at how this new feature works as well as its potential issues.
How Does It Work?
While running an EC2 instance, there are a series of system status checks (using AWS CloudWatch) that monitor the instance as well as any components that are necessary for the instance to function. These checks look at a number of measures, including network connectivity performance, physical software crashes, and hardware functionality. When one of the system status checks fails, you can restart the instance on new virtual hardware, but auto recovery will not recover the EBS volume to the time before the failure occurred. Note that you have to set up auto recovery for existing instances in order for it to work. Additionally, the newly launched instance will have the same ID, private IP address and metadata, though it will get a new public IP address.
Only newer instance types such as C4 and T2 are supported by auto recovery, the more classic types are not. Additional limitations are indicated in the AWS documentation.
Auto recovery settings can be found in the EC2 instance CloudWatch preference window:
The Data Corruption Challenge
There are some challenges that the auto recovery feature doesn’t handle. If there is an error with the hardware to read and write to an EBS volume, auto recovery won’t necessarily retrieve the lost data as the in-memory data would be corrupted. However, you can automate snapshot recovery and create a new EBS volume with the same data. Learn how to automate EBS snapshots.
Another problem that is not covered by auto recovery is a corrupted system disk. If the operating system being used needs to reboot, it may fail, which in turn would cause auto recovery to fail. It should be clear that handling data corruption can be solved by running a proper cloud backup policy, such as CPM, a native cloud backup solution for AWS users.
The instance auto recovery feature makes Amazon EC2 a reliable platform with enough flexibility to keep databases and mission-critical applications afloat when system failures occur. Even though there are missing internal features to handle data corruption, they can and should be added to the process. Auto recovery is a great addition to Amazon’s cloud as a tool for keeping pace with the business world through clear and stormy weather alike. Learn how EBS Snapshots and EBS Volume restore work.