Disclaimer: The following information is an educated guess based on my experience with other similar mechanisms, in addition to my experience working on N2WS backup solutions for AWS cloud.
EBS snapshots, which are block-level incremental snapshots, have proven to be an efficient and cost-effective solution for backup in the EC2 environment. In order to provide reliable backup, the EBS snapshot mechanism needs to track the changes that occurred since the last time a snapshot was taken, or recognize if it was the first time a snapshot was taken.
This is done by a monitoring process that takes place during the snapshot operations and between snapshots. Each EBS volume has a bitmap that will be marked to indicate which block was changed since the last snapshot. When a following snapshot is taken, the process will know which blocks need to be copied to S3 (which according to AWS, store the snapshots’ repository).
Snapshot Consistency: Copy-on-Write
The snapshot needs to be consistent, e.g. to reflect the exact image of the volume at the point-in-time of the snapshot. The mechanism that handles this procedure is known as a copy-on-write. Write operations can take place during a snapshot process. In which case, a write operation, may then overwrite the data that was previously on the block, consequently harming the ongoing snapshot operation’s consistency. This means the data that was written after the snapshot started will be copied, and the snapshot will not represent the image of the point in time in which the snapshot started. It should be noted that read operations will not interfere with the snapshot and only write operations can cause a corrupt snapshot if copy-on-write is not performed.
Old data out of that block needs to be copied before it is erased by the operation and new data is written with the copy-on-write mechanism. When a write I/O request is received, the mechanism needs to decide whether or not to perform copy-on-write on the block. If the write operation is ahead of the copy operation a copy-on-write begins, it holds the write operation for a very short moment, and then copies that block out to S3. Only after the data is copied, the write operation resumes along with the sequence copy process.
EBS Volume Restore
The EBS restore process creates a volume from snapshots and functions similarly, yet opposite to the copy-on-write process, as it copies data in the reverse direction. In this case, it holds the disk for read operations and not writes.
The restore process uses a bitmap of the blocks that need to be restored to the disk to recognize exactly which blocks need to be copied from snapshots. It then copies the blocks to the disk from the snapshot repository in order. When it starts copying the data to the volume, it listens for read and write requests to the disk. If a written block was on the bitmap (not empty) and a write operation is performed during restoration, the bitmap is unmarked so that the snapshot will not overwrite the new data.
It is important to note that a volume becomes available right when the restore operation begins, even though the actual data had not yet been fully copied to the disk. If a read operation is performed during a backup from blocks that are not yet written, the read is temporarily suspended, the required blocks are copied from the repository, then the read operation resumes, and the sequence restore process resumes, as well.
Great Cloud Advantages
EBS restore is very efficient since it allows you to restore a large amount of data in seconds. The main advantages that EBS snapshots offer is that data is quickly backed up to S3, which is a cost effective, reliable infrastructure, and provide block-level incremental backup that saves storage space.
The most compelling advantage, however, is the snapshots’ ability to perform “rapid recovery” in the restoration process. Storage arrays in traditional data centers (EMC, NetApp, etc…) also provide rapid recovery, but typically use different mechanisms for snapshots, such as Split-Mirror. These mechanisms do not usually copy blocks at all.
From the moment the snapshot begins, a new write operation is redirected to another area in the disk, storing a bitmap of the location, and leaving the old volume as is. The disadvantage with hardware snapshots, as mentioned in Oracle’s documentation about backup, is that they may back up data quickly, but the backed up data is not copied at all.
Both the production disk and snapshot are located on the same hardware. This proves to be a risky option if it is your only backup plan, even though these devices are very reliable. EBS snapshots are a great option, due to the fact that the snapshot data is kept in a separate infrastructure (EBS and S3) than the actual volume.