Disk Array Backup on EC2 Part II: Consistency Issues

Disk Array Backup

In Part 1 of our series, we discussed what disk arrays mean, and why and how they are used in the EC2 environment.

So, how do we ensure the consistency of our backups with disk arrays? When thinking about the storage stack, one needs to remember that each layer is oblivious of the implementation of the layers beneath it. For instance, most storage layers implement caching to improve performance.

disk array consistency EBS snapshots backup volume manager

So, an application may have in-memory caching, but it doesn’t always know if and how the file system implements its caching. That said, an application has ways to ensure that the file system flushes its data, either using specialized APIs or closing files, which instantly causes file data to be flushed. The same can’t be said for the file system and the volume.

It’s possible for a snapshot to be taken of a disk (e.g. EBS volume), but if all cached data was not flushed into that disk prior to the snapshot, the result will be an inconsistent image. In most cases, applications manage to overcome such inconsistency, just as they recover from system crashes following a power outage. The case of disk arrays is specifically delicate, because even if caches are flushed, it is possible that not all underlying disks are 100% synchronized.

To ensure the backup consistency of disk arrays, it is first recommended to ensure the application is in a consistent state. This is usually accomplished using the various APIs that applications provide – for example Oracle, MySQL, MongoDB and more (some examples in our blog). When the volume contains a single EBS volume (or disk), application-level quiescence is good enough. When using a disk array with multiple EBS volumes, it is safer to ensure that the volume manager is in a consistent state to make sure all volumes are synchronized in all snapshots. This should be done after the application is already in a consistent state. Thus, the order of operations should be:

Application – make consistent
Volume Manager – make consistent
Take EBS snapshots

Different Environments

Windows Dynamic Disks

Windows is a well-integrated environment. Disk arrays are typically defined using an internal Windows mechanism called Dynamic Disk. Windows Dynamic Disk allows you to define software RAIDs, volumes spanning on multiple disks and more. In Windows, backup of the whole storage stack is covered by an infrastructure called VSS (Volume Shadow Copy Service), which even includes applications in most cases. All Microsoft applications, like SQL Server, Exchange and SharePoint support consistent backup via VSS. When using a backup solution supporting VSS (e.g. Cloud Protection Manager), everything is covered, including the file system and Dynamic Disk volumes.

Linux Mdadm

In Linux, mdadm is a utility used to create and manage software RAIDs. It is not in itself a volume manager. There is no specific support for snapshots or backup. In many cases mdadm is used in conjunction with LVM (Logical Volume Manager), which does support snapshots. When using mdadm on its own, one can only rely on application-level and file system-level consistency methods. The xfs file system in Linux, for example, allows flushing and freezing the file system, which can reduce the risk of taking snapshots of an mdadm software RAID. When using LVM, whether with conjunction with mdadm or without, there are internal LVM snapshots that can be utilized for EC2 backup using EBS snapshots.

In the third part of this post, Disk Array Backup on EC2 Part III: LVM backup with EBS Volumes, we will see how to perform such backup.

Disk Array Backup on EC2 Part II: Consistency Issues