Not long ago, ‘CDP’ was the main buzzword in the field of data protection. The acronym for continuous data protection (or real time backup), denotes the process of continuous backups. As a result, every write to disk is asynchronously recorded to a related backup mechanism, which enables recovery at any point in time. While this may resemble replication, it does not actually replicate the disk onto a standby machine, but rather, records the writes to a backup repository, requiring a restoration process for data recovery.
While CDP was a hot topic for quite a while, it ended up being a difficult process to accomplish successfully as a result of the amount of resources necessary to carry it out. The fact that every I/O that is written needs to be redirected and recorded affects the I/O system’s performance, not to mention the storage capacity needed to hold the data. Generally, the repository resides in a different location from the host, which requires networking resources and further adds to the weight and cost of the solution. Secondly, one of the advantages of standard backup procedures is that consistency methods can be performed, choosing an exact point-in-time where the application and data are consistent. This option does not exist in CDP, so while returning to the most recent point-in-time may be possible, returning to a specific and consistent point-in-time is a bit more challenging to achieve.
So, What Is Near-CDP?
The outcome of these obstacles were solutions that were not true-CDP but rather near-CDP, which utilizes frequent backups, similar to CDP in terms of recovery point objective (RPO), but not as expensive or resource consuming. While near-CDP is still a highly frequent backup, it can be made consistent and has proven to be more practical than true CDP solutions.
Certain critical applications require such a high level of data protection that they can’t afford to lose a single transaction or I/O. An example of such an application is a bank or internet payment service, like Paypal. Each lost transaction translates into money.
Such applications typically have synchronous replication solutions that ensure that the data is never recorded in one location only. Consequently, a backup solution may be applied, in addition to replication, in order to better respond to logical data loss and save a history of changes. Somewhat less critical applications can manage with a near-CDP solution that can provide data from a short while back when recovery is needed.
I have seen products that offer true-CDP and near-CDP, where most of the customers end up using near-CDP because it is the more practical option, in terms of how it affects performance and resources. While the cloud is automated with infinite resources, true-CDP is not necessarily the best solution. As resources may be endless in the cloud, they are still expensive, and storage performance remains an issue. High performance levels are an ongoing issue in the cloud, which are made apparent by AWS’ frequent release of new storage features, such as SSD. Once storage performance reaches the appropriate levels, true CDP may very well be a reasonable option, keeping the cloud’s infinite resources and automation in mind.
Near-CDP with EBS Snapshots
In AWS, EBS Snapshots are used as an efficient and cost-effective backup infrastructure. Due to their relatively fast block-level incremental snapshots of disks, EBS Snapshots create a stable foundation for near-CDP solutions, being able to automatically take snapshots at the very high rate of up to every few minutes. Due to the fact that they are incremental and taken at a high frequency, the snapshots are much smaller, which could be a far greater advantage than taking snapshots at a lower frequency. As a result of the speedy restore process in EBS volumes, a CDP solution may be ideal, so long as the high frequency of snapshots can be managed.
For example, a database usually contains separate disks for data, transaction logs, and system software, enabling a high level of granularity, because snapshots are taken at the volume level. A CDP policy can be based on an application’s data update behavior, meaning very frequent near-CDP snapshots can be performed solely on the transaction logs disk, which have a higher frequency, while the other volumes use a lower rate. In this case, the most recent transaction logs are provided. Even if the main database is backed up in a less frequent schedule, one can always apply the latest transaction logs during recovery. Utilizing this method can save tremendous costs for critical and large databases.
An additional advantage, as mentioned above, is quick recovery. Some other CDP solutions perform backups into a remote repository. If there is no instant restore feature, all of the data needs to be copied during recovery, which will most likely increase the recovery time objective (RTO), or length of the recovery, whereas a snapshot recovery process has close to zero RTO.
CDP is a very powerful, yet occasionally, expensive solution. While the current cloud environment has proven to be a bit problematic, the future holds much more applicable cloud storage. As for now, in terms of AWS, in particular, with good backup tools and methodologies AWS cloud customers can leverage the EBS Snapshots mechanism to build a near-CDP solution and create great resiliency for cloud based applications.
Cloud Protection Manager (CPM) is an enterprise-class backup solution for EC2 based on EBS & RDS snapshots. It supports flexible schedules and can manage near-cdp backup policies. CPM is sold on AWS Marketplace with prices ranging from $62.5/month to $500/month. See pricing or try it for free.