Disaster Recovery (DR) is a process that helps you prepare for any kind of unwanted disaster. The DR process is designed to reduce the negative impact on a company due to a loss of any service. In the production world of scalable applications, DR is just as important as any other nonfunctional requirement. In this article, we will discuss how to use CPM effectively to achieve proper Disaster Recovery for AWS.
In the early days, organizations were selecting cloud as their DR option. However, with growing popularity and adoption of the cloud, the cloud vendor had to offer DR mechanism for native cloud applications. For that purpose, AWS not only has a scalable and comprehensive snapshot mechanism but a presence in 13 regions and 35+ availability zones, which is ideally suited to support backup and DR. AWS offers on-demand, scalable and pay-as-you-go infrastructure that also helps in planning the DR process.
In this article, we will briefly describe the different DR options on AWS as well as provide you with a step-by-step guide to automating your AWS cross-regions DR considering your RTO and RPO preferences.
RTO and RPO: The Basic
One of the key aspects of DR is identifying your RTO & RPO to help design the right DR solution. Let’s first understand the two key SLA parameters for disaster recovery.
RTO stands for Recovery Time Objective. It is the time taken by an organization to restore the service after it was disrupted due to some disaster. For example, if a site was down due to some major issue at 11:30 PM and the SLA for the site says that it should be back and up by maximum 1:30 AM, the RTO is two hours.
RPO stands for Recovery Point Objective: The RPO defines the maximum amount of data that can be lost and measured in time. For example, if a site was down due to some major issue at 11:30 PM and the SLA for site says it should lose maximum data between 10:30 PM and 11:30 PM, in this case, the RPO of the site is 1 hour and system should have planned for backup every hour to meet the RPO.
When we talk about DR, there are various options available for achieving different levels of DR on AWS including Pilot Light, Hot Standby, and Active-Active / Multi-site setup. You can learn more about this here.
RTO & RPO come with a cost. With various kinds of DR options mentioned, you can see below that the lower the RTO/RPO is, the higher the cost of the infrastructure (DR process) will be.
A company should determine its RPO & RTO plans based on the costs involved and the liability to its users. AWS itself offers an SLA of 99.95% for most of its services considering the unforeseen disasters. In addition, if you require a higher SLA level, an automation of cross region backup and a failover mechanism might be important requirements.
How to Automate DR on AWS with CPM
Cloud Protection Manager (CPM) is a full-featured enterprise-class backup, recovery, and DR solution for Amazon EC2 Instances, EBS volumes, and RDS databases, utilizing the AWS native EBS and RDS snapshots.
Don’t have a CPM user? Start here
As shown below, we have deployed a production LAMP stack on AWS EC2. For our demo, the additional volume has MySQL database installed. We will show how we backup and recover cross region.
1. Considering that we are planning an RPO of one hour, we have configured the CPM policy to schedule an automated backup every hour.
2. As shown below, CPM is configured to take backup of both the root as well as the additional EBS volume of the instance.
3. For the DR, we are planning to have a copy of the snapshot available in a separate region.
As mentioned in this article, you can use AWS tools to manually copy the snapshots or use the CPM that does it automatically.
As show below, when using CPM, you can easily set a policy to create an automated copy of the snapshot to any of the other AWS regions.
In order to enable that, you should select the target region or multiple regions and set the “performing DR” frequency in terms of backup. You may want to copy snapshots of every backup to other regions, which is the default option. However, this might impact your storage costs, so it’s better set this one to your own specific requirements.
In addition, the “DR Timeout” is required since the copy snapshot is happening over WAN and the copy process duration depends on the volume of data. If the DR process is not completed in a certain timeframe (in hours), CPM assumes the process is hanging and will declare that the copy has failed. In addition, CPM provides an advanced option to run cross AWS account backup.
As shown in the image below, once the policy is configured, CPM will automatically start taking a snapshot every hour and copy it over to the target region.
We can check the logs of the completed snapshots for more information:
Here, it is important to note that when a snapshot is copied over to separate region, it is copied over WAN. This will be a slower process than copying a snapshot to another availability zone in the same region. AWS also charges a data transfer fee for copying data from one region to another. As of October 2016, AWS charges at $0.02 / GB to transfer from the US East region to any other region.
CPM gives a quick and easy option to recover the instance within a target backup region.
Select the region where you keep your backups and click the Recover Instance option:
Configure the parameters as needed. Recovery of an EBS volume is pretty straightforward, but for an instance, you will have additional options.
As shown below, you will need to set parameters such as the instance type, VPC (i.e. subnet), key pair, and security group.
Once all parameters are configured correctly, CPM will auto-restore the instance in the target region.
In this article, we have achieved the automated cross region replication. It is important to note that RTP & RTO comes with a cost, including the snapshots data storage costs as well as data transfer. Under this consideration, you should define the required uptime and plan your backup and DR frequency. You may plan for backup everyday to separate the regions instead of the CPM hourly default. We strongly recommend that you to learn how to how to estimate your snapshot costs.
Finally, CPM facilitates automation and management of your backup and recovery, providing you the ultimate protection and confidence that your data and applications are safe.
CPM, is an enterprise-class backup, recovery, and disaster recovery solution for EC2. It uses existing EBS volume and RDS database snapshot abilities to automatically take snapshots at regular intervals. Additionally, you can set up policies and schedule backups for various targets. CPM helps manage snapshots with policies. For example, if you have multiple snapshots, the older ones may be irrelevant. With CPM, you can configure a policy to delete the snapshots after a certain period. This also helps in cost savings and effective backup management. Learn more about CPM