There are several ways to automate EBS snapshot creation. You can even do it yourself: either create a script that will run at certain times, and will go over your environment and take snapshots of all EBS volumes. Another way would be to write a script on each instance that takes snapshots of the attached EBS volumes. That script can be launched at scheduled time as a CRON job on Linux instances, or using the Windows scheduler on Windows instances.
Either way, it will get harder to manage as your environment gets bigger, more elaborate and dynamic. Alternatively, you may use an off-the-shelf snapshot automation products, such as N2WS Backup & Recovery, that is provides a comprehensive solution for AWS EC2 backup. In this article, we will discuss the main considerations of leveraging the snapshot mechanism to automate backups and ultimately build a robust and backup process.
5 Main Considerations
Keep the following five challenges and considerations in mind when creating a robust automated backup solution with EBS snapshots:
1. Adaptive and Dynamic
Enterprises that use the cloud deal with a very volatile and dynamic environment. Instances are deployed and terminated automatically on a continuous basis. Therefore, your automation mechanism needs to be adaptive and flexible in order to ensure that data and application replicas are up to date with recent changes in your environment. Manual methods that rely on copying a script into an instance before it has been configured are extremely cumbersome and prone to human errors.
Snapshot automation is important especially when it comes to critical production volumes that need to be backed up. Because this is such an important aspect of this type of environment, it’s important to know that backups are taking place as planned. But how can you make sure that your auto scripts are reliable, and how can you get notifications when something fails?
The answer lies in remembering to put mechanisms in place that report errors as they occur. Generally, when a script is running and there is a blip, it needs to report the failure. But what if the script doesn’t run in the first place? What if the cron job was disabled on a specific instance a couple months prior, but the failure is just now being discovered?
It is imperative to ensure that everything is running in the right place, at the right time, and most importantly…successfully. At the end of the day, the reliability of your snapshot automation scripts is directly related to the reliability of your system as a whole.
Imagine you have numerous EBS volumes from various instances. How will you gain control over such a large environment? How can you possibly know that all of your 200 EC2 instances are being backed up on a frequent basis? What if you wish to treat some of your environment differently than the rest? When trying to manage your environment on a large scale, new challenges are inevitable. In this case, a dashboard that allows you to easily check up and follow what is happening with your backup environment is ideal.
This aspect is particularly relevant to successful, fast growing environments. And by leveraging AWS’ flexibility, you can easily grow your environment from tens to hundreds or even thousands of instances in a short period of time. Companies should always be prepared to face sudden growth, which may require revamping backup and disaster recovery needs to support the growing environment.
4. Fast Recovery
Let’s say that everything is in order and snapshots are being taken regularly. All of the channels are appropriately managed and everything is working as planned. Suddenly an instance crashes and you need to dig out the latest snapshot. How straightforward is it to find the correct snapshot and recover the instance or volume? Will you have to start tirelessly sifting through snapshots and information? What if the instance has been fully managed, but someone terminated it hours ago and it has since completely vanished.
Finding the correct snapshot to restore isn’t as easy as you think. Maybe the search needs to go back a couple of days, or more, while time is ticking and the service is still down. Recovering the wrong snapshot could cause even more damage. Therefore, a fast and defined recovery mechanism is extremely important to keep downtime to a minimum.
How do you make sure that your snapshots are accurate and fully recoverable? Let’s say, for example, that you’re doing a simple snapshot backup that involves the default option of rebooting an instance during snapshot creation. Your snapshots will most likely be recoverable and consistent. This also means that you will have experienced a certain amount of downtime during the reboot, which could have been critical for your system as a whole.
A snapshot is essentially just another building block that should be implemented in backup automation solutions. Nevertheless, in order to successfully implement your backup policy, you need to be very familiar with incorporating best practices and align your policy in accordance with your SLA (e.g., production or development). In our next article, we will discuss how this can be done. Find out more best practices on AWS storage and cloud backup solutions in our library of AWS how-to guides and blog.
Also, if you’re interested in automating your Amazon EC2 environment, check out N2WS Backup & Recovery, our enterprise-class backup, recovery and disaster recovery solution for Amazon EC2 instances, EBS volumes, RDS databases and Redshift clusters.