The exponential increase of cloud migration from on-premise has significantly changed how to deploy and manage your environment. There is now lots more flexibility when dealing with resources, and in this article we want to focus on cloud storage. In this post we’ll take a look at Amazon S3 Storage – Amazon’s primary storage solution.
The widespread availability of the public clouds like AWS have led to many enterprises moving away from the on-premises data centers, and in doing so, significantly changing how they deploy and manage their environments. Amazon S3 Storage has some excellent related storage management tools which you can utilize to benefit your business needs.
What is Amazon S3 Storage?
Amazon’s Simple Storage Service (S3) is Amazon’s widely available, scalable, and durable (99.999999999%) object storage, designed for customers of all sizes and requirements. It was one of the first services released for the AWS cloud, and has been its core product ever since—not only is it utilized by almost all of the customers, but many other services rely heavily on it. Amazon S3 storage can be used for many use cases such as archiving, backup and restore, big data analytics, but also for IOT devices, mobile, and other applications, etc. We are seeing colder storage options like S3 becoming the norm as more and more compliance regulations are dictating long-term storage requirements.
S3 offer multiple storage classes to choose from. For general purpose storage, S3 Standard is used. Priced at $0.023 per GB stored per month, it is designed for high availability and durability of frequently accessed files, and can sustain the loss of an entire Availability Zone if a disaster occurs.
If you have data that is accessed less often, but still needs to be readily available on-demand, S3 Infrequent Access (IA) offers good value. S3 IA comes in two variants, with Standard-IA that costs $0.0125 per GB, and for One Zone-IA (data is stored in a single AZ instead of being replicated within three different ones) you pay only $0.01 per GB.
S3 Intelligent-Tiering is a special class that allows you to automatically move your data to the storage class that is most cost-effective, without any operational overhead.
If you need a low-cost cold storage, S3 Glacier provides a secure and durable solution for long-term archival. It is priced at only $0.004 per GB, and offer multiple choices for retrieval (each with its own cost attached)—you can get your data as quickly as a few minutes (at a higher cost) using expedited retrieval, or opt for the cheaper standard (3 to 5 hours), or the cheapest bulk (5 to 12 hours) option.
As of recently, Amazon introduced S3 Glacier Deep Archive—its cheapest storage class ($0.00099 per GB) meant for archiving data that may be accessed only once or twice in a year. It is designed for those who require really long term data retention (like Healthcare, Public Sector, etc.) due to regulatory compliance requirements, but can be also used for backup and disaster recovery as a cheaper alternative to standard S3 Glacier.
Amazon S3 Storage Management Tools
When working with Amazon S3 Storage, you have various tools at your disposal that can help you out by, for example, giving you detailed insights into your storage usage (as well as different access patterns occurring), or even by providing you with the ability to automate processes like lifecycle management or data replication. Let’s look over some of these tools that you might find to be useful for your company’s needs.
S3 Object Tagging
Tagging resources on AWS is a crucial part of both management as well as access control. You can utilize these key-value pairs (each resource can have up to 10 tags associated with it) for cost analysis (when parsing spending using Cost Explorer), limiting access using IAM permissions, or automating desired piece of infrastructure. When working with Amazon S3 Storage objects, tags can also be of help to set up lifecycle policies, filter various metrics, or help out with S3 Analytics.
As an example, you might want to have a logical separation between various teams working on different projects within your company, so you tag each bucket appropriately. You can later use this to have a clear view of each team’s spending, or you can rely on these tags to expire desired objects, since each project could have different retention requirements.
Tags can be added via web console, but also using command line interface (CLI) or S3 API. They do cost, but come at only $0.01 per 10,000 tags, so make sure you utilize them to your benefit.
S3 Analytics
Another tool that can be very useful is S3 Analytics. By providing you with the ability to analyze the retrieval patterns of you objects, you can rely on the results to make the best choice for the storage class you should be utilizing.
S3 Analyze allows you to run the analytics on only the desired data within your bucket, and you can either specify a prefix to look for, or alternatively use tags for the same purpose—both of which give you enough flexibility. Of course, you can combine both if desired.
The analytics are run daily and provide a detailed graphical overview of storage as well as data retrieved historically. You can utilize these S3 Analytics results via web console, but you can also ingest them into a Business Intelligence (BI) tool like AWS QuickSight to inspect them a bit deeper if required.
S3 Inventory
Many AWS clients store lots of objects in various S3 buckets, and it is not unusual to have dozens of millions of objects in a single bucket at times. Listing content of the bucket using a LIST operation for reporting or other purposes works well up to a certain point, but when you have too many files this becomes an issue as LIST returns only up to 1000 objects at a time. With S3 Inventory you can speed up this workflow significantly. This feature gives you a daily or weekly CSV file representation of the content of your bucket, and your report can have all of your objects from a specific bucket, or only a part defined by a prefix. You can even include some optional fields such as storage class, size, and replication status of your objects.
S3 Inventory also provides you with more information about your objects than you would get by LIST operation—things like delete markers, replication status, and multipart upload flags are shown. These reports can be sent to another bucket, or even to another AWS account if defined by a specific workflow. On top of everything, using S3 Inventory provides you with some cost savings, by being 50% cheaper to use compared to using multiple LIST operations.
You can use S3 Inventory to trigger various data audits, offline analytics, or even secondary index garbage collection etc.
S3 Cross Region Replication
As we have mentioned, Amazon S3 storage provides a very durable and widely available storage solution. But that is sometimes not enough. Maybe you have to comply with specific standards, or you simply want to protect yourself from a single point of failure. Disasters do occur occasionally, so if your business depends on the data within your buckets it might be a good idea to have them replicated to another region (remember, by default S3 only replicated data within a specific region). This cross-region replication feature allows you to set up an asynchronous object copy quickly and easily—after enabling versioning on both the source and the target bucket you can choose to either replica the entire bucket, or a specific subset based on the prefix.
Recently Amazon has announced selective replication based on tags. This greatly simplifies the process, but you can now also have an even more granular filtering when copying your data to another region. And for those who are relying heavily on cross-region replication, make sure to look into a replication monitor. This solution is built to automatically check the replication status and provide near real-time metrics, but also notifications when failures occur.
Summary
S3 provides an extremely flexible storage solution in the cloud. With unmatched durability (but also scalability and high availability), as well as multiple storage classes—each coming with its own benefits in terms of cost savings—it is built to cover any requirement or use case a company might have. And with the available storage tools that can help you with anything ranging from cost savings to automation and disaster recovery, Amazon S3 storage only looks even better—so make sure you look into them as you will surely find them to be very beneficial for your cloud environment.