The previous post, Disk Array Backup on EC2 Part II: Consistency Issues, described the challenges in performing EBS snapshot-based backups for disk arrays on EC2.
When performing a backup on an LVM data volume that spans multiple EBS volumes, one needs to make sure the snapshots are taken in a consistent state. AWS APIs do not currently allow snapshots of multiple volumes to be taken at the exact same point in time.
LVM allows creating LVM snapshots. LVM snapshots reside in the volume group of the original volume, and stores all the changes made to the original volume since the snapshot’s point in time. LVM snapshots shouldn’t be used as a backup infrastructure because:
- LVM uses a copy-on-write mechanism to copy changes to the snapshot area. This consumes resources and may affect the logical volume’s IO performance.
- The snapshot consumes storage space from primary disks.
- The snapshot relies on primary data, so if there’s data loss, corruption or outage, snapshots will be as useless as the primary data.
However, LVM snapshots can be a powerful way to ensure consistency when used in conjunction with EBS snapshots. All you need to do is to create an LVM snapshot with minimal storage space allocated, just before the EBS snapshots are taken.
Then, remove the LVM snapshots right after the EBS snapshots were initiated. With this approach, LVM snapshots are kept for a very short time, and neither affect performance nor consume significant storage space. They are removed after a short while, but are contained inside the EBS snapshots. So when recovered, LVM snapshots could be used to revert to the consistent state.
By way of example, let’s assume N2WS is used – although the same code can also be used elsewhere. N2WS allows creating a backup policy for a bunch of instances or EBS volumes, and to define scripts to be executed just before and right after the EBS snapshots are taken – known as the “before” script and “after” script.
These scripts are executed in the N2WS server instance and use SSH to connect to the backed up instance, performing whatever operations are needed. Assuming the “before” script first performed quiescence on the application (if needed), we’ll review here only the LVM snapshot code:
#!/bin/bash
ssh -i <ssh private key file> <user>@<address of instance> "lvcreate -n snapvolcpm -L 1G -s /dev/mapper/VolGroup00-MyVol00"
if [ $? -gt 0 ]; then
echo "Failed taking lvm snapshot" 1>&2
exit 1
else
echo "lvm snapshot succeeded" 1>&2
fi
This script connects to the instance using SSH and performs a command to create an LVM snapshot with a pre-defined name. This version is a bit simplified; in a production script there may be additional error checking.
If the backup policy contains more than one LVM volume group, there may be need to create more than one LVM snapshot. In the “after” script, the LVM snapshots will simply be removed:
#!/bin/bash
if [ $1 -eq 0 ]; then
echo "There was an issue running the first script" 1>&2
fi
ssh -i <ssh private key file> <user>@<address of instance> "lvremove --force /dev/VolGroup00/snapvolcpm"
if [ $? -gt 0 ]; then
echo "Failed removing lvm snapshot" 1>&2
exit 1
else
echo "remove lvm snapshot succeeded" 1>&2
fi
After configuring these scripts, LVM doesn’t contain any snapshots most of the time, but EBS snapshots will contain LVM snapshots which can be used in case of recovery. When recovering the instance with all EBS volumes, the first thing that needs to be done is to revert to the LVM snapshots. Thus, after logging in to the newly created instance, type:
umount /dev/mapper/VolGroup00-MyVol00
lvconvert --merge /dev/VolGroup00/snapvolcpm
mount /dev/mapper/VolGroup00-MyVol00 /volume_mountpoint
These commands unmount the volume, merge the snapshot with the main volume, and then re-mount it.
With this approach, you are able to perform backup using EBS snapshots, with all their advantages, and still make sure that the backup is consistent in the volume manager level.