In part one, I described the EBS snapshot mechanism. In this part, I will show how it is possible to calculate EBS snapshot costs. I will show how to do a rough estimation or even perform an accurate cost analysis using monitoring tools. And you can check out this post for more on AWS EBS pricing.
In order to estimate how large your EBS snapshots will be, you need to know how much your volumes are changing. One way would be to guesstimate, we can use a simple thumb rule that is often used in- backup planning: A typical data volume of a production server changes about 3% a day. Let’s try and calculate the cost. Assuming a 1TB EBS volume, that is 70% full at first. We take snapshots and keep them for 30 days. So, the first full will be taking 700GB (70% of 1TB). For the incremental snapshots, we can multiply 30 (days) by 30GB (3% of 1TB) and we reach 900GB. Add them together and we reach about 1.6TB of total snapshot storage. AWS compresses the snapshots when they are stored in S3. It is hard to estimate how much data will be reduced by compression.
If compression is zip-like and data on the EBS volume consists mostly of text files and can be compressed very well. On the other side, if data on the volume is already compressed (e.g. compressed file system, media files), it will not be compressed at all. You can decide not to factor compression into your calculation or give it mostly a 2:1 ratio. The cloud cost of storing 1GB of EBS snapshot data is $0.095/month (Virginia region, February 2013). For 1600GB the price will be 152$/month. If we assume compression is effective, it will be half: 76$onth. Accurate? No. Something we can work with, maybe…
A more accurate method
If you need a more accurate method of knowing how much your EBS volumes are changing, you can sample them. To do that you can install software that monitors your disk changes and reports them to you. Take a large enough sample at typical times, and you can get a very good idea on how much any specific EBS volume is changing. For Windows instances you can use the internal Windows tool, Performance Monitor (simply type run > perfmon), `Perfmon` can give you the number of bytes written on average per second, just add the logical disk related counters.
Another tool would be Disk Monitor, a tool you can download from Microsoft’s site (originally written by SysInternals), it can monitor writes to disk and create a file from it that can later be imported to a spreadsheet. You can download it from here: http://technet.microsoft.com/en-us/sysinternals/bb896646.aspx . In Linux instances, you can use a command-line Python-based open source tool named Iotop (http://guichaz.free.fr/iotop/).
Write patterns and how they affect snapshot size
Write IO patterns affect the amount of data your snapshots will take. Let’s take an example: An EBS volume with 1GB of data and then every day there is a 1GB change on the volume. So the first full snapshot will take 1GB of snapshot storage space, and then every daily incremental will also take 1GB. Now let’s assume we keep snapshots for 10 days and delete any older ones. So, if every 1GB is written to new unused blocks on the volumes (e.g. new static files were written, older ones don’t change), then my snapshot data will grow by 1GB every day forever (or until the EBS volume if full).
Deleting old snapshots won’t matter because all the blocks they occupy will need to be saved. So after 10 days, you will have 10GB of snapshot data, and after 100 days 100GB. Now let’s assume the other extreme: There is only 1GB of occupied space on this EBS volume, and every day that same 1GB is overwritten (e.g. a bit like a database file that changes a lot, but not necessarily grows). In this case, you will have 10GB of snapshot data after 10 days, but after 100 days you will still have 10GB of snapshot data because older snapshots are deleted.
Number of snapshots don’t necessarily matter
We keep talking about a daily change. How does the frequency of snapshot-taking fit into that? Well, that depends. You can take one snapshot a day or take six. If in the same day blocks won’t be written and then rewritten it doesn’t matter. One or six snapshots will use the same amount of storage space, and therefore will cost the same. This is a very significant conclusion when configuring your EBS volumes backup solution; you can actually take a higher resolution of snapshots without increasing the cost, giving you a better RPO (Recovery Point Objective). In reality, things will probably not be that “clean,” but in a typical application, most data will probably not be rewritten all the time, and in most cases, you will be able to take more frequent snapshots without affecting your AWS bill by much.
Currently, you can only estimate how much S3 storage space your snapshots take. To help plan your budgets and to use your EC2 backup solution more effectively, you can estimate the amount and pattern of changes of your EBS volumes, by making assumptions or by sampling them.