Almost all businesses work with information that is stored digitally as data —and every piece of that data has a life cycle. For example, your business might ingest data from a sensor, store it in some sort of database, process or analyze it, and then destroy it when it is no longer needed. But as your amount of data grows—which it typically does at an exponential rate—so does the complexity of working with it. This is where data lifecycle management (DLM) comes in.
Data lifecycle management defines policies that will govern where data is stored, for how long, and who has access to it, among other things. A data lifecycle management policy not only increases efficiency, it also facilitates better data protection. This enables companies to easily comply with the many standards being imposed on businesses today.
This two-part article will look at the benefits and challenges of data lifecycle management within the AWS environment. Part 1 will examine the first two stages of DLM: data collection and data storage.
The Stages of Data Lifecycle Management
There are four stages involved with data lifecycle management. While the definition of these stages can vary, the principles behind them do not. You always need to collect the data, store it somewhere, use it for something, and archive or destroy it at the end of its lifecycle. Let’s go over these stages in detail.
Data Collection
Data can come from sources within AWS as well as from the outside world. Data management is relatively easy if your environment is fully contained on AWS. However, many businesses need to ingest data from on-premises servers (when they’re running hybrid cloud environments, for example), sensors spread across a variety of locations, or even a client’s data center. In these cases, security is an issue. Whether you are concerned with competition, compliance requirements, or just malicious entities, transferring any data (especially mission critical data) through the open internet should be taken very seriously.
It’s crucial to use encryption in transit. If you rely solely upon the AWS environment, you have encryption enabled for all data traveling from one service to another by default. If you are sending data to AWS from the outside, many services will still support data encryption in transit. For example, your data uploads to S3 can use SSL endpoints with the HTTPS protocol. Additionally, you can rely on the AWS Certificate Manager service, which lets you provision and manage various certificates for ensuring encryption of data in transit.
Another key tool to implement is the VPN. By creating an encrypted tunnel between the remote location and your cloud environment, you protect your incoming data and stay compliant with the various regulations frequently imposed on organizations. You can set up and manage the VPN yourself using something like the self-hosted OpenVPN, or you can try the managed AWS VPN service which offers both site-to-site and client VPN solutions.
AWS Direct Connect is another useful service. It allows you to set up dedicated private lines that run from various on-premises locations directly to AWS, completely circumventing the public internet space. This eliminates a lot of security issues.
Depending on the use case and the needs of your business, your DLM policies might require you to use a specific encryption method or ingest all data from the outside through a VPN, for example. Thankfully, AWS offers a variety of options through their managed services. These can help you improve data management as well as overall safety.
Data Storage
Data storage is probably the lifecycle management stage that presents the most challenges. Moving a massive amount of data to your storage solution requires you to think about and plan for various factors.
Estimating and Provisioning Storage Capacity
Historically, storage capacity planning has been a very tedious task. Even with a somewhat predictable ingress of data (which is rarely the case), it has always been necessary to create a detailed strategy for provisioning storage hardware. After getting your estimates, you would still have to put in the order for the storage, wait for it to arrive, install the physical components, and, finally, set up everything to work with your environment. Back in the data center days, this was a process that took a couple of days at best. More commonly, it took a few weeks.
With AWS, all of these procedures are handled with a couple of clicks, since Amazon provides you with virtually unlimited storage capacity in the cloud and reduces the provisioning process to mere minutes—or even seconds.
The Cost of Storage
Movement to the cloud has also changed the way businesses pay for their storage needs. Whereas before you had to invest huge amounts of money up-front into very expensive hardware components, now, thanks to the capital expenses model, you only pay for what you actually use. Need 50GB of object storage this month? Use it, and pay for it. Need 20TB the following month? No problem; you’ll be charged at the end of the month. No strategies for buying expensive hardware, no storage estimation, no long-term budget planning—simply use whatever you need. With the decreasing costs of AWS storage (as well as their other services), on-premises hardware solutions simply cannot compete with cloud storage options.
Ensuring Durability of Data Through Disaster Recovery
Disaster recovery (DR) will certainly be enforced through your DLM policies. Disasters do happen sooner or later, and it is of utmost importance to be prepared for them by having a proper backup plan in place, as well as a disaster recovery strategy. Employees need to be ready for the unexpected, so disaster recovery drills should be conducted often. They’re the only way to determine whether or not your business is actually prepared to come back from an undesirable event.
Here again, AWS provides lots of benefits. First off, by creating multiple accounts, you can create logical segmentation and do cross-account DR, ensuring that even if your entire account has been compromised, your data is safely stored away. Also, the ability to quickly and very cheaply bring up new infrastructure means that you can perform DR drills more often and at a very low cost.
A sample DLM policy might state that all data is to be stored in an S3 general tier, although, if cost is a concern, and the use case allows it, you might use S3 Infrequent Access or another solution. Your policy might also specify the number of data backups that need to be made (with critical data being backed up more regularly) as well as how often DR drills must be conducted.
How Data Lifecycle Management Can Drive Success
Data lifecycle management can help you drive success by enforcing various policies for the ingestion, storage, analysis, and disposal of your data. While every company has different data requirements, both internal and external, properly defining and later following DLM policies will ensure the cost-efficient and secure flow of data throughout your cloud environment.
Working with AWS allows you access to a vast infrastructure and a multitude of supporting services which can make DLM much easier. Within AWS, your policies can push for more security and compliance and, at the same time, reduce your overall costs.
This article was just the beginning of our DLM discussion. In Part 2 of this series, we’ll talk about data usage, archiving, and destruction. We’ll also introduce you to N2WS Backup and Recovery, a modern cloud native backup tool that can help with multiple stages of the DLM process.
Laurent is a Senior System Engineer at N2WS and AWS Certified Solutions Architect with more than 10 years of experience. (He's also both bilingual and the lead singer of a French rock band in the UK, making him très cool.)