In the previous post, Backup vs. Replication in the Cloud, part I: Concepts, we saw the differences between backup and replication solutions. This post we’ll look into specific data loss scenarios.
Let’s inspect a few different data loss/data recovery scenarios in cloud environments and see how backup solutions and replication solutions may help:
- Server crash: A server (running on a cloud VM/instance) crashed and disappeared. This could be a result of hardware failure, or software/OS failure resulting in a crash. A backup solution could recover the server to its last backup time. Depending on the backup schedule, the recovered data can be a day old, a few hours old, or only a few minutes old. A replication solution could typically get you the most recent data. However, if the crash was a result of a data corruption (e.g. corrupted system files), it is possible that the replicated copy will also be corrupted, making data recovery problematic. With backup, if the latest backup is corrupted, you can always revert to a backup from before the corruption. Recovery time is about the same: a few minutes (assuming the backup solution is snapshot-based).
- Cloud outage: A server or multiple servers disappears because of an outage in the cloud provider’s data center. This can be a local partial outage (e.g. an availability zone within an AWS region) or an entire data center. In this case, a replication solution will recover to the environment’s most recent state. Recovery with a backup solution will include some data loss, but using a snapshot-based backup solution, the loss may be minimized to a few minutes. Recovery time takes about the same with replication and snapshot-based backup solutions.
- Data loss as a result of a human error: Someone accidentally deleted half the application’s data by running an erroneous script or by clicking on the wrong button. In this case replication may help up to the point where the deletion operation is replicated, which could be almost instantaneously. In this case, where logical corruption is the issue, you would probably not want to revert to an alternate system, but rather recover the data into the existing system. Backup can help you with the recovery, but of course the copy will be somewhat older than the latest replication copy.
- Data loss as a result of a malicious attack: Many applications in cloud environments are web-facing, and therefore potentially vulnerable to attack. This scenario is similar to the human error scenario, with one key distinction: when such a scenario occurs, you don’t want to go just a little way back. Instead, you need to find the time of the attack or infection and recover to an earlier time, keeping in mind that the time of the attack is not necessarily the same as the discovery of data loss. In this case, a backup solution will probably provide the best answer.
- Accessing deleted data: Sometimes there’s a need to access data that had already been deleted on purpose, be it a whole volume, a folder, a document or an email. This scenario is not addressed at all by replication and requires a backup solution.
Understanding the differences between replication and backup solutions in cloud environments, one must decide on a data protection policy. To recover from outages, an extremely rare occurrence with big cloud providers, a replication solution will provide the best RPO, although snapshot-based backup solutions can come close. In almost all other data loss scenarios, a backup solution will be necessary.
It all depends on cost and business needs. If you have a production application that must continue during an outage without losing any data at all, a replication solution is needed. If a small gap is allowed, a backup solution may be sufficient. And in any case, a backup solution is imperative to be able to recover from all other cases of data loss.