We’ve talked before about the importance of recovery drills in AWS and why they must be performed on a regular basis. Ensuring successful backup and recovery is essential for handling data loss scenarios in order to minimize risks, such as data corruption. When you recover an instance in EC2 from an AMI or a snapshot, you are left with a new instance running in your account. Obviously this is not enough to ensure that recovery has been successful, meaning the application is up and running again, functioning at full capacity.
In this article, we will discuss the various levels of testing associated with running backup verification for database servers hosted on Amazon EC2.
In order for a recovered database instance to maintain functionality, you should verify both the instance and the data. Essentially, there are three levels of tests needed to verify that your backup is complete:
1. Host Level
Make sure the instance is running properly, the host has booted properly, and you are able to connect to it without interruptions. The first thing you want to do when you run an instance is to see that the first two EC2 CloudWatch status checks – the system status check and the instance status check – succeed on the instance.
- System Status Check – An underlying infrastructure physical host check that includes factors such as network connectivity and system power, for example. According to AWS, in order to fix issues, you can stop/start, or simply start a new instance.
- Instance Status Check – On the software and network configuration level, this check includes corrupted file system issues and malfunctioning startup configurations.
These status checks happen automatically. If an issue occurs during a recovery drill or in a real recovery scenario, these tests will fail. You should automate failure notifications. Click here to learn more about how to create CloudWatch alarms.
2. File System (check disk)
Following these two host-level tests, you can run another type of check – a file system check. When you perform automated EBS volume backups, such as with Cloud Protection Manager, you want to make sure that the backup data is not corrupted. To do so, there are basic OS tools that test and ensure proper functionality:
- In Linux – You can perform a command called fsck (file system consistency check) on the file system.
- In Windows – The chkdsk command has tags that alter the basic integrity of the file system. Nevertheless, this test still performs the standard file system checks on the file level. After running chkdsk and ensuring that everything is running smoothly, you can confidently state that the file system has its integrity and there are no corrupted files.
In addition, it’s important to note that if AWS detects issues with the volume, the volume IO will automatically be disabled and you will need to enable it back using the console. Only then can you run the disk check. Learn more.
Tip: You can use the AWS CLI command – describe-volume-status – which returns the status of one or more volumes. Volume status provides the result of the checks performed by AWS on your volumes’ physical layers to determine events that can impair the performance of your volumes. Learn how to monitor your EBS volumes.
3. Database Validation
If the fsck or chkdsk are successful, run a database validation, which is the third level (application) of testing. This type of check requires you to use a tool that is compatible with a specific database.
For example, in the case of MySQL, you can use mysqlcheck, or myisamchk, both of which perform table maintenance by checking, repairing, optimizing, and analyzing all sorts of tables. Other databases have different utilities, such as Microsoft Exchange, which uses a utility called Eseutil that performs a database check of the exchange server. You can also check out this list of 15 validating database files and backups to use for Oracle.
It is important to mention that the verification testing process can be quite complicated, as seen through the various processes and tools mentioned in this article. To ensure that each level of testing is properly run, the backup validation process needs to be automated so that the tests are run on a regular basis, creating a powerful system that continuously verifies your backup solution. In our next post on this topic, we will discuss the specific steps of automation; in particular, how Cloud Protection Manager supports automation by providing an API for instance recovery that can be used to facilitate the test automation process.