The terms backup, replication and disaster recovery are often used interchangeably and confused. Let’s try and define these terms more accurately:
- Backup (as defined in Wikipedia): “The process of copying and archiving computer data so it may be used to restore the original after a data loss event…” The article elaborates that “Backups have two distinct purposes. The primary purpose is to recover data after its loss, be it by data deletion or corruption…The secondary purpose of backups is to recover data from an earlier time, according to a user-defined data retention policy.”
- Replication (as defined in Wikipedia): “Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility…” The article defines data replication as “One speaks of replication if the same data is stored on multiple storage devices.”
- Disaster recovery (as defined in Wikipedia): “The process, policies and procedures that are related to preparing for recovery or continuation of technology infrastructure which are vital to an organization after a natural or human-induced disaster.”
Although what’s written below applies mostly to traditional environments and on-premise virtualized environments, I’ll try to focus on the applicability of these terms to the world of the public cloud. Backup and replication remain pretty much the same; backup will record the data at different times and keep copies of it for a pre-defined time period. Replication will replicate the data and keep an updated copy of the data in addition to the original copy.
What about disaster recovery (DR)? When we entrust our IT infrastructure to a major cloud provider, we enjoy the benefits of a large state-of-the-art data center, one most companies could not afford to build on their own. That makes this data center less vulnerable to manmade and natural disasters. One extremely rare yet possible scenario for a disaster in the cloud is an outage spanning across an entire data center. In AWS, for example, every region is divided into availability zones, which make a region-wide outage highly improbable. A DR solution in these cloud environments is essentially the ability to recover systems and applications in another data center (e.g. in another AWS region). A DR solution can be based on either backup technologies or on replication technologies. While an actual outage is an often-considered DR scenario, most data loss scenarios are in fact logical by nature and not caused by hardware failures, outages or disasters.
There are a few more important differences between backup and replication. In terms of RPO (recovery point objective), replication strives to keep RPO close to zero, which means that if you need to recover, the data will be (almost) available in its most updated state. With backup, RPO depends on frequency of backups. You can only go back to your most recent copy of the data. With modern snapshot-based backup solutions, backup can be very frequent, and can reduce the RPO to minutes.
Another key factor is recovery time objective or RTO. In the past, traditional replication solutions kept a “hot standby” system and could recover immediately as opposed to backup solutions which needed to copy the data back into the primary storage, an operation that could take hours and even days. With modern snapshot-based solutions, recovery of servers and data takes seconds to minutes, so recovery is almost immediate. Therefore, no significant difference currently exists between replication and backup solutions in terms of RTO.
The decision to use a replication solution for recovery from crashes and outages is ultimately a business decision. Replication will provide better RPO, but the cost of a replication solution is much higher than the cost of a backup solution, especially if live standby servers or replicas are used. If your business can’t afford to lose a single second when crashing, you’ll probably need a replication solution, although you will probably need backup anyway to protect against other types of data loss.
In the second part of this post, we will look at five different data loss/recovery scenarios and see how backup solutions and replication solutions can be used to resolve them.
In the next post, Backup vs. Replication in the Cloud, Part II: Five Data Loss Scenarios, we will inspect data loss scenarios, and how they are handled by backup and replication solutions.