This guide explains a few key concepts to help you use CPM correctly.
By default, snapshots taken using CPM are Crash-consistent. When you back up an EC2 instance at a certain time, and later want to restore this instance from backup, it will start the same as a physical machine booting after a power outage. The file system and any other applications using EBS volumes were not prepared or even aware that a backup was taking place, so they may have been in the middle of an operation or transaction.
Being in the middle of a transaction implies that this backup will not be consistent, but actually this is not the case. Most modern applications that deal with important business data are built for robustness. A modern database, be it MySQL, Oracle or SQL Server, has transaction logs. Transaction logs are kept separately from the data itself, and you can always play the logs to get to a specific consistent point in time. A database can start after a crash and use transaction logs to get to the most recent consistent state. NTFS in Windows and EXT3 in Linux have implemented journaling, which is not unlike transaction logs in databases.
During application-consistent backups, any application may be informed about the backup progress. The application can then prepare, freeze and thaw in minimal required time to perform operations to make sure the actual data on disk is consistent before the backup starts., making minimal changes during backup time (backup mode) and returning to full scale operation as soon as possible.
There is also one more function that application-consistent backups perform especially for databases. Databases keep transaction logs which occasionally need to be deleted to recover storage space. This operation is called log truncation. When can transaction logs be deleted without impairing the robustness of the database? Probably after you make sure you have a successful backup of the database. In many cases, it is up to the backup software to notify the database it can truncate its transaction logs.
When taking snapshots, the point in time is the exact time that the snapshot started. The content of the snapshot reflects the exact state of the disk at that point in time, regardless of how long it took to complete the snapshot.
In the case of taking snapshots of multiple volumes, which is probably the most common case, it would be preferable for all the volumes to be at the exact same point in time. Unfortunately, AWS does not currently support such an option. Therefore, the best CPM can offer is taking the snapshots of multiple volumes in very close succession. In most cases, it will not make a difference, but in cases where exact point in time across volumes/disks is needed, only backup scripts or VSS can achieve this goal. If the backup script of a backup policy flushes and locks all volumes in a synchronized manner, snapshots of this policy will reflect an exact point in time. Using VSS achieves this goal, since VSS by definition performs shadow copies of multiple volumes in a synchronized manner. By freezing applications that use multiple volumes, like a database which has a volume for data and a separate volume for transaction logs, you can also achieve the goal of backing up multiple volumes at a single point in time.
The type of backup to choose depends on your needs and limitations. Every approach has its pros and cons:
- Does not require writing any scripts.
- Does not require installing agents in Windows Servers.
- Does not affect the operation and performance of your instances and applications.
- Does not guarantee consistent state of your applications.
- Does not guarantee exact point in time across multiple volumes/disks.
- No way to automatically truncate database transaction logs after backup.
- Prepares the application for backup and therefore achieves a consistent state.
- Can ensure one exact point in time across multiple volumes/disks.
- Can truncate database transaction logs automatically.
- May require writing and maintaining backup scripts.
- Requires installing a CPM Thin Backup Agent for Windows Servers.
- May slightly affect the performance of your application, especially for the freezing/flushing phase.