Enterprises are moving more and more applications and data centers into the public cloud. This trend, which began a few years ago, doesn’t seem likely to stop in the foreseeable future. This migration also includes mission-critical IT resources and services.
The AWS vision of a cloud architecture where you use EC2 instances only as compute blocks and keep all persistent data in managed services, like S3, RDS & DynamoDB, has a lot of advantages. But as we see more complex environments move to the cloud, we see a broad range of applications on EC2 instances: from Windows applications, such as SQLServer and SharePoint, to Linux servers running a wide range of applications, such as full-scale Oracle servers, Mongo DB clusters, and MySQL. Enterprises put their critical workloads on EC2 instances, using EBS volumes for their persistent storage on the cloud.
When moving actual servers to the public cloud (in this case AWS), enterprise IT leaders naturally expect and look for solutions that are comparable to the ones they had in their traditional datacenters, in terms of management, security and data protection. When using EBS for persistent storage, leveraging EBS snapshots in a manner similar to traditional hardware snapshots is a must for any critical application. Existing hardware snapshot-based backup solutions in traditional data centers (for example, EMC, NetApp, or IBM) offer a rich feature set.
See features of Cloud Protection Manager here
These key features are required to protect critical production applications:
- Policy-based decision making: This ability to group resources in logical units and decide how to back up those resources: Scheduling, application consistency, retention windows, behavior in case of failures, etc. In terms of scheduling, a wide set of options is needed to carefully control application Recovery Point Objectives, from weekly/daily backups to high frequency near-CDP backups.
- Application-consistent backup: For critical applications, it is important to make sure backup/snapshots are consistent and will not fail to recover properly when needed. To this end, a backup solution must be able to interact with various types of applications and configurations, including databases, file systems, clusters and arrays containing multiple disks (in this case EBS). See posts about Oracle, MongoDB and LVM
- Complete control and monitoring of backup operations: When managing production applications, it is crucial to know what’s going on, especially when something goes wrong. Although snapshot failure is rare, application consistency operations fail more frequently for various reasons. Application consistency commands can fail if there are connectivity problems with the servers and even more often, they can fail if there is a high activity level on the server (peak time). Or, even worse, application consistency commands can just hang and/or interfere with the production application. A tier-1 application backup solution needs to be able to detect such situations and respond to them based on actions determined by policy. Possible actions might include: fail the backup and try again later, notify DevOps, and so on.
- Alerts & Notifications: A backup solution needs to give a clear view of any issues the solution is experiencing and provide notifications when something bad happens. If a consistent backup fails, or keeps failing, it is imperative that the relevant IT/DevOps person, or a group of people, be notified. In addition, the backup solution needs to be able to integrate with monitoring solutions used by other operations (e.g., NOC), and any alerts need to be sent to such a monitoring solution as well.
- Disaster Recovery: An outage is a rare event in AWS, but it has happened in the past and will probably happen again. Critical tier-1 applications need to recover quickly during an outage, thus minimizing downtime, which can cause serious damage to businesses. In terms of AWS or EC2, applications need to be able to recover to a different availability zone than their original one, and for more serious outages, to a remote AWS region. The probability of an outage occurring in more than one AWS region at the same time borders on zero.
- Rapid and Granular Recovery: Downtime, whether caused by data loss or an outage, is a very stressful time for IT/DevOps teams. The recovery process must be quick, easy, and error-free. There is a need to identify the backup to recover from, recover using a simple operation, avoiding any human errors, and complete the recovery quickly to minimize downtime and achieve the required Recovery Time objective (RTO). Furthermore, it is important to have granular recovery options, to be able to perform recovery in different levels (e.g. whole application/server level, single disk/volume level).
A reliable and robust backup solution is imperative for any production application and especially critical tier-1 applications. Having such a solution can serve as an enabler for companies to move such applications into EC2; conversely, the lack of a backup solution can be a barrier for such a move.
N2WS Backup & Recovery (CPM) is an enterprise-class backup, recovery and disaster recovery solution for EC2, allowing protection of critical applications in the EC2 cloud.