Let’s Talk About Block Storage – part 2

Let’s Talk About Block Storage - part 2In the first part of this two-part series, we presented the main characteristics of block storage types available on the Amazon public cloud. To recap, AWS offers us instance storage and elastic block storage, with the second option being much more popular among EC2 users. This is due to the benefits of data persistence, snapshotting and cloning, encryption, and 99.999% availability. EBS volumes have two main categories: SSD-backed and HDD-backed volumes, with differences in performance and available volume sizes. The question, however, is how to know when to use which type, and what to pay attention to, so you will not have IO bottleneck in your application.

When to Use SSD-Backed Volumes?

The main characteristics of SSD-backed volumes are low latency and performance measured in IOPS, making them perfect for intensive transactional workloads where the(?) volume performs many smaller read and write operations. This type of workload is usually performed by RDBMS and NoSQL databases but also can be seen in production application instances, or any dev/test environment.

General purpose SSD (gp1) is the entry category for SSD-backed volumes, and while this AWS EBS volume type shouldn’t be your main choice for crucial production instances, don’t count gp1 out completely. If you provision a larger gp1 volume, you will benefit from increased IOPS (remember, it’s a guaranteed 3 IOPS per provisioned GB, up to a maximum of 10 000 IOPS per volume) and there is an accumulation of credits over time. If your workload is not predictable and has low and high peaks, you may be able to cover them with burstable credit performance if the size of your volume is right, and you don’t quickly deplete all your “earned” credits. So, when you spin up a new instance with gp1 EBS volume, pay attention to size and the IOPS your application requires and see if you can manage with gp1 or if you need to go to more expensive provisioned IOPS volumes.

Provisioned IOPS SSD (io1) supports up to a maximum of 20,000 IOPS per volume, which can be increased even further up to 75,000 IOPS per instance using RAID 0 (preferably LVM) and multiple EBS io1 volumes. Customers using this EBS volume type also need to keep an eye on IOPS-to-GiB ratio, which needs to be at least 50:1 (recommended is 2:1 for best per-I/O latency experience). Of course, you will pay the highest price for this type of EBS storage ($0.138 compared to $0.11 with gp1, plus additional $0.072 per provisioned IOPS), meaning that for the smallest (a 400 GiB volume powered by a maximum 20K IOPS) you will pay almost $1,500 per month! And, imagine if you were to attempt keeping a 2:1 ratio in size combined with maximum IOPS, which would lead to provisioning a 10 TiB volume and spending a small fortune on it.

When using both SSD-backed volume types, the key is to plan and benchmark your workloads. To do so, play around with fio and try to understand if your workload can tolerate the limitation of gp1 volumes If not, you may need to provision IOPS io1 volumes. Disk space usage also plays quite an important role, since the size will impact performance with both SSD-backed volume types. It is important to know is that SSD-backed volumes are only EBS types from which you can power your instances. Since Windows boot partitions tend to grow quite large due to the installation of frequent updates and service packs, you will need to consider the size of the boot volume you will need and its expected growth over time.

When to Use HDD-Backed Volumes?

HDD-backed volumes are designed for large sequential workloads. This makes them ideal for data warehousing or big data manipulation and for any application storing large chunks of data, like log keeping.

Throughput is a main performance criterion with HDD-volume types. So, if you don’t need more than capped 250 MiB/s of throughput, choose the cold (sc1) HDD-backed volume type. At just $0.028 for 1 GiB, sc1 volumes are perfect for any non-system partition where you need a lot of space, and if you need to clone or snapshot that volume S3 mounts are not an option. Also, S3 mounts won’t give you nearly the same network access latency as with EBS. Similar to SSD gp1 volumes, HDD-backed volumes also accumulate credits over time. Regarding sc1 that is 12 MiB/s up to a maximum of 1 TiB in credits.

Throughput Optimized HDD (st1) volumes support higher throughput (maximum of 500 MiB/s) and also use the credit burst model. With this volume type, however, the credits accumulate at a rate of 40 MiB/s, meaning you need less time to reach your maximum and you can burst to bigger throughput much more often than with sc1 volumes. The price per GiB is still relatively low at $0.05, so hosting a large-scale data warehouse in production with several terabytes of storage will not be so expensive.

Summary

Before deploying your EC2 instances, no matter their purpose, you should first acquaint yourself with the requirements (in terms of performance criteria, IOPS or throughput) of your application and spec out the required free space. This will help you decide on the EBS volume type, the size needed to provision and whether you will choose a cheaper option (sc1 with HDD, gp1 with SSD-backed) or more expensive, performance sensitive volume types (st1 with HDD, io1 with SSD-backed).The maximum size for all EBS volumes is the same (16 TiB), but because of big differences in pricing, it is more practical to use HDD-backed volumes for larger, secondary partitions, and to utilize SSD-backed ones only when you need to achieve a specific target IOPS on a volume. Besides fio, CloudWatch alarms that track I/O latency can also help you measure the performance your application needs effectively. The general AWS rule you most likely have read several times already also applies here – start small, research, benchmark and then upgrade your volumes appropriately. That is the best way to reach required performance, while not spending a fortune on your instance storage.

Learn more about N2WS’ Cloud Protection Manager , our snapshot-based block solution utilizing the most efficient backup available.  CPM automates backup and recovery for EC2 instances, EBS volumes, RDS, Redshift, and Aurora clusters by extending and enhancing native Amazon snapshots.