Oracle on EC2: Backup using EBS snapshots and CPM

Introduction:

Oracle is probably the relational database mostly associated with enterprises. Oracle is also widely used in the AWS cloud. Oracle is offered via the RDS service, but many AWS users also run Oracle on EC2 instances. When doing so and storing the data on EBS volumes, the most powerful solution would be to back up these databases using EBS snapshots.

Oracle & snapshots:

Oracle has many methods for backup, switching log files, creating checkpoints and whatnot. Since snapshots create a point-in-time image of disks, there is no need for all these methods. If the backed up image is the exact image of the original server at a consistent and well-defined point-in-time, it should just work. So, basically all that needs to be done is to put Oracle into “backup mode” before taking the snapshots and then releasing it after. Why is it a powerful solution? There are several reasons, but the most compelling one would be recovery time, or RTO (Recovery Time Objective). If you have an Oracle database of a considerable size, say hundreds of GiBs, restoring a database can take a long time, since it requires copying a large amount of data, while recovery using snapshots is almost immediate.

In Oracle’s official documentation, there’s a warning not to use snapshots as a backup solution. Why is that? Because when they say “snapshots,” they mean using hardware snapshots of storage arrays in traditional datacenters. Such snapshots are very fast, but typically the snapshot data is stored in the same storage array as the original production data. Furthermore, the data that is common to the snapshot and production disk (data that hasn’t changed since the snapshot was taken) is usually stored at the exact same location. Even though raids are used at most times for redundancy and robustness, this means that if something happens in the hardware level and the data of the original production volume is lost or corrupted, there is a good chance the snapshot data suffered the same fate.

This does not apply to EBS snapshots. In many ways EBS snapshots are similar to hardware snapshots, but with EBS snapshots the data is actually copied to a separate location, so any failure in the original EBS volume does not affect snapshot data. What’s even more compelling is that this separate location is actually S3, AWS’s storage cloud, which is an especially robust and durable storage target.

Oracle scripts for EBS snapshot-based backup:

So, all we need to do to backup oracle in a consistent manner, is to make oracle move into backup mode just before the EBS snapshots are taken, and to release it from backup mode once the snapshots started. There is no need to wait until the snapshots are completed since EBS snapshots are guaranteed to be consistent to the time they started. Any changes to the data after a snapshot is started will not affect the data stored in the snapshot. Luckily, Cloud Protection Manager (CPM) supports this exact scenario by defining backup scripts.

There’s a “before” script that will contain the transition into backup mode operation, the “after” script that will release from backup mode, and the “complete” script that may be left empty if no cleanup or other operation need to happen after backup is complete.

So, using CPM, we defined a backup policy named “oracle,” and added the Oracle Server instance as the backup target for this policy. To enable backup scripts, we used the “More Options” screen:

oracle policy options

We even enabled collection of script output, so if a script prints out errors or other information to stderr, CPM will collect that output, store it in its database, and will allow viewing it in the UI.

The Scripts:

Before script “before_oracle”:

#!/bin/bash
remote=oracle@<oracle server address>

ssh -i oracle_id_rsa_file $remote <<EOFSSH
bash <<EOFBASH
/opt/app/oracle/product/10.2.0/db_1/bin/sqlplus -S “/ as sysdba” <<ENDOFSQLPLUS
ALTER DATABASE BEGIN BACKUP;
ENDOFSQLPLUS
EOFBASH
EOFSSH
if [ $? -gt 0 ]; then
echo “Failed locking oracle” 1>&2
exit 1
else
echo “locking oracle successful” 1>&2
fi

This script simply connects to the Oracle server via SSH, runs SQLPlus, Oracles admin client, and runs the “ALTER DATABASE BEGIN BACKUP;” command, which moved the database into backup mode. Please note that the SSH RSA key file needs to be stored in the CPM server and to be at the location specified (in this case under the name “oracle_id_rsa_file” in the same folder of the scripts).

After script “after_oracle”:

#!/bin/bash
remote=oracle@<oracle server address>

ssh -i oracle_id_rsa_file $remote <<EOFSSH
bash <<EOFBASH
/opt/app/oracle/product/10.2.0/db_1/bin/sqlplus -S “/ as sysdba” <<ENDOFSQLPLUS
ALTER DATABASE END BACKUP;
ENDOFSQLPLUS
EOFBASH
EOFSSH
if [ $? -gt 0 ]; then
echo “Failed to take DB Backup” 1>&2
exit 1
else
echo “Oracle DB backup successful” 1>&2
fi

This script is built the same, only does the opposite operation: ends the backup mode. The “complete” script needs to be provided, but can be an empty one.

Recovery:

After the scripts are tested and working CPM will take care of the rest. Examples of how to recover an instance can be found in previous posts and in CPM’s documentation. After recovering the Oracle server, there is still one last operation that needs to be done. Since the snapshots were taken while Oracle was in backup mode, the recovered Oracle server will also be in backup mode, and needs to be released from that state using the following commands: (after login to the new Oracle server):

sqlplus / as sysdba
startup mount;
alter database end backup;
alter database open;

Oracle will not start automatically at boot time, so it needs to be mounted, released from backup mode and then opened.

After those operations, Oracle will work without any further issues. Overall with this backup method, one can count on their backup solution to be consistent and reliable, and complete Oracle servers can be recovered in minutes.

Share this post →

You might also like: