Introduction to EBS Volumes and Snapshots

EBS Volumes

EBS volumes (Elastic Block Store) are persistent and dynamic disks you can define in Amazon’s EC2 compute cloud. You can attach them to EC2 instances. EBS volumes live separately than the instances they are attached to and “live on” even after the instances are terminated (assuming they were not explicitly configured to terminate with the instance).

EBS volumes are the parallel of LUNs (external disks) storage arrays provide in the traditional data center. In the traditional data center, you typically have storage arrays, and you can define dynamic disks or LUNs and attach them to different servers (or hosts). We do not know what EBS’s implementation in the EC2 infrastructure is. They may be implemented using commercial storage arrays (e.g. EMS, NetApp, IBM ) but are more likely implemented using simple disks to make them more cost-effective. Like in storage arrays, AWS makes sure there is enough redundancy to avoid data loss even when multiple hardware failures occur. Redundancy may be implemented using one of the good old RAID algorithms or by another method.

You can create, delete, attach and detach EBS volumes using the AWS Management Console, another EC2 management tool or by using the API.  EBS volumes are currently limited to 5000 volumes or 20TB per account, but that limit can be increased by filling a special request form. Each EBS volume is currently limited to the maximum size of 1TB.

AWS takes care of attaching and detaching volumes to instances in the infrastructure level, i.e. you don’t have to configure anything on the instance itself. However, it is commonly assumed that under the covers, iSCSI is used as the method of storage attachment.  This shouldn’t bother the users, except that if actual networking is used as the means of IO for these volumes, their performance can be affected by the amount of traffic on the network. AWS also offers special IOPS EBS volumes. IOPS (acronym for IO operations per second) are a type of volume that guarantees a certain service level or performance that may be important to critical applications. IOPS volumes are more expensive than regular EBS volumes and can be used only with certain types of instances that are defined as “EBS-optimized instances.”

EBS Snapshots

Similar to most storage arrays in the traditional data center, EBS volumes come also with snapshot capabilities. Snapshots start at a certain point-in-time and store the exact image of the EBS volumes at that point-in-time.  Since it takes time to create a snapshot, and since the volume is “live” during the snapshot creation, some method needs to be applied to ensure that the image is of the volume at the exact point-in-time when the snapshot started. In hardware snapshots in traditional storage arrays there are a few possible methods, some of them use a split mirror method, which means that you start writing new changes to the disk to a different location and keep bitmaps to keep mapping both the static snapshot image and the live volume.

AWS keeps the EBS snapshots in S3 (AWS’s storage cloud). This implies that the snapshot data is actually copied out of the EBS volumes to another location (S3). We can conclude that the method that is used is probably copy-on-write.   When implementing copy-on-write, the snapshot process starts to copy the data from the volume, and when a write operation is about to change data that was not copied yet, the snapshot process pauses the write operation for a moment and copies out the old data before it is overwritten. After the old data is copied out, the write operation is resumed. Copy-on-write does not stop or fail any IO requests, but can cause a certain performance penalty until the copy operation completes.

Here are few additional things we know about EBS snapshots:

  • EBS snapshots are “block level incremental”: which means that every snapshot only copies the blocks (or areas) in the volume that were changed since the last snapshot (the first one is a “full” snapshot). This affects the user in two ways; first it is much more cost effective: you only pay for what had changed each time. Performance-wise, EBS snapshots are very fast because they only need to copy the incremental delta from when the last snapshot was taken. One more interesting point about block-level incremental snapshot is that in many cases, you will pay the same if you take 1 snapshot a day or 6 snapshots a day, since for every snapshot you only pay for the changes. This depends on the pattern of writing in your volumes, if a lot of areas in the volume are overwritten many times a day then this assumption is incorrect.
  • EBS snapshots are “content-aware”: Even when you take the first full snapshot of an EBS volume, it only copies the areas in the volume that contain data. Areas in the volume that were never written to are not copied, which saves time and storage space.
  • EBS snapshots are stored in S3: We already mentioned that, but it is an important point since in S3 data has guaranteed durability which is excellent. Another important point is that EBS snapshots are kept in a separate infrastructure than the actual EBS storage, which means that failure in the production data will most probably not affect the snapshot data. Also EBS volumes are stored in a specific availability zone, and can connect only to instances in the same zone.  However, EBS snapshots can be restored to any availability zone which means that if there’s an outage in one availability zone, you can still use your snapshots to recover your data to a different zone.
  • EBS snapshots can be copied between regions: This is a new feature, but what it means is that you can actually use EBS snapshots as a disaster recovery plan, which will keep your data available and recoverable, even if a whole region is down.
  • EBS snapshots are crash-consistent: We already said that snapshots are a consistent image of a specific point-in-time.  But what if that point-in-time was in the middle of writing a file, or a transaction in a database? EBS snapshots give you the image of a volume you expect to find after a machine crashes, as if someone pulled the power cord out. In most cases it will be okay, but in some cases the data on the volume may be corrupted. To overcome this you will need to perform some consistency method on applications (e.g. databases) or file systems to make sure that the snapshot was taken at a “good” point-in-time. You can do it yourself by writing scripts or you can use a solution like Cloud Protection Manager (CPM) to help you achieve this goal. You can read more about that in our user’s guide in CPM’s documentation page.

EBS snapshots are probably the best way to perform backup for your EBS volumes. You may want to use an EBS snapshot management solution to help you automate, manage and monitor snapshots, take care of stuff like scheduling, retention management (deletion of old snapshots), application support, recovery operations etc.

Cloud Protection Manager (CPM) is a comprehensive EC2 backup solution that uses EBS snapshots.

More info on EBS volumes and snapshots you can find in AWS’s web site: aws.amazon.com/ebs/