Troubleshooting the Most Common S3 Backup Issues

You’ve set up AWS S3 backups for your critical data, but something’s not working right. Your backups are failing, costs are skyrocketing, or files are mysteriously missing after restore. Let’s dive deep into the three most common S3 backup headaches and provide detailed, CLI-focused solutions to get your backups back on track.

1. The Permission Puzzle: IAM Configuration Issues

The number one culprit behind S3 backup failures is improperly configured Identity and Access Management (IAM) permissions. When your backup job fails with an “Access Denied” error or a message indicating AWS Backup can’t describe your S3 resource, you’re facing a permissions problem.

Understanding the Root Cause

AWS Backup for S3 requires specific permissions to access your bucket, create snapshots, and manage recovery points. Even when using the default AWS Backup service role, S3-specific policies aren’t automatically attached. This creates a common trap: assuming the default role has sufficient permissions, when in fact it needs additional policy attachments.

Let’s fix this issue step by step using the AWS CLI:

First, identify the role used by your backup plan:

# List your backup plans to find the one in question
aws backup list-backup-plans

# Get details of your specific backup plan (replace with your plan ID)
aws backup get-backup-plan --backup-plan-id YOUR_BACKUP_PLAN_ID

# Examine the roles used
aws backup get-backup-selection --backup-plan-id YOUR_BACKUP_PLAN_ID --selection-id YOUR_SELECTION_ID

If you’re using a custom IAM role, create a policy document specifically for S3 backup permissions that includes all the necessary permissions:

# Create a JSON policy file named s3-backup-policy.json
cat > s3-backup-policy.json << 'EOF'
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:GetObjectVersion",
                "s3:GetObjectVersionTagging",
                "s3:PutObject",
                "s3:GetBucketVersioning",
                "s3:PutBucketVersioning",
                "s3:GetBucketNotification",
                "s3:PutBucketNotification",
                "s3:ListAllMyBuckets"
            ],
            "Resource": "*"
        }
    ]
}
EOF

# Create and attach the policy to your role
aws iam create-policy \\
    --policy-name S3BackupCustomPolicy \\
    --policy-document file://s3-backup-policy.json

# Get the policy ARN from the output and attach it
aws iam attach-role-policy \\
    --role-name ROLE_NAME \\
    --policy-arn POLICY_ARN

To verify that your role now has the correct permissions:

# Check attached policies
aws iam list-attached-role-policies --role-name ROLE_NAME

After fixing the permissions, test your backup by running a manual job:

# Start an on-demand backup
aws backup start-backup-job \\
    --backup-vault-name YOUR_VAULT_NAME \\
    --resource-arn arn:aws:s3:::YOUR_BUCKET_NAME \\
    --iam-role-arn arn:aws:iam::YOUR_ACCOUNT_ID:role/ROLE_NAME \\
    --backup-rule-id YOUR_RULE_ID

N2W Solution: N2W simplifies permissions management with pre-configured IAM role templates that ensure your backups have exactly the right access. The platform automatically verifies permissions and alerts you to any missing or incorrectly configured access rights, eliminating the common ‘Access Denied’ errors that plague manual S3 backup configurations.

2. The Versioning Explosion: Storage Costs Going Wild

Have you noticed your backup storage costs mysteriously doubling or tripling despite your S3 bucket size staying relatively stable? You’re experiencing the versioning explosion with S3 backups – a direct result of S3 versioning combined with ineffective lifecycle management for versioned objects.

Understanding the Root Cause

AWS Backup for S3 requires that S3 Versioning be enabled on the bucket being backed up. When versioning is enabled, every modification creates a new version instead of overwriting the file. Without proper noncurrent version lifecycle rules, these versions accumulate indefinitely, and all of them get backed up.

Each of these factors multiplies your storage costs:

Every deleted file creates a delete marker (but the file still exists as a non-current version)
Every modified file creates a new version (the old version becomes non-current)
Every backup captures all versions, including non-current ones
Continuous backups retain multiple copies of all these versions

The solution involves implementing comprehensive lifecycle rules to automatically manage versions. Here’s how to do it with the AWS CLI:

First, create a lifecycle configuration file:

cat > lifecycle-config.json << 'EOF'
{
  "Rules": [
    {
      "ID": "ExpireNoncurrentVersions",
      "Status": "Enabled",
      "Filter": {
        "Prefix": ""
      },
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 30
      }
    },
    {
      "ID": "CleanupDeleteMarkers",
      "Status": "Enabled",
      "Filter": {
        "Prefix": ""
      },
      "Expiration": {
        "ExpiredObjectDeleteMarker": true
      }
    },
    {
      "ID": "ArchiveOlderVersions",
      "Status": "Enabled",
      "Filter": {
        "Prefix": ""
      },
      "NoncurrentVersionTransitions": [
        {
          "NoncurrentDays": 7,
          "StorageClass": "STANDARD_IA"
        },
        {
          "NoncurrentDays": 14,
          "StorageClass": "GLACIER"
        }
      ]
    }
  ]
}
EOF

This configuration:

Expires noncurrent versions after 30 days
Removes expired delete markers automatically
Transitions noncurrent versions to S3 Glacier to cheaper storage classes based on age

Now apply this S3 versioning lifecycle policy to your bucket:

aws s3api put-bucket-lifecycle-configuration \\
    --bucket YOUR_BUCKET_NAME \\
    --lifecycle-configuration file://lifecycle-config.json

Verify that your configuration was applied correctly:

aws s3api get-bucket-lifecycle-configuration --bucket YOUR_BUCKET_NAME

To monitor your backup storage usage and identify cost growth trends:

# Get a list of your recovery points
aws backup list-recovery-points-by-backup-vault \\
    --backup-vault-name YOUR_VAULT_NAME \\
    --by-resource-arn arn:aws:s3:::YOUR_BUCKET_NAME \\
    --query 'RecoveryPoints[*].[RecoveryPointArn,BackupSizeInBytes,CreationDate]' \\
    --output table

For existing versioning bloat, you might want to perform a one-time cleanup of extremely old non-current versions:

# List objects and versions to identify candidates for cleanup
aws s3api list-object-versions \\
    --bucket YOUR_BUCKET_NAME \\
    --prefix OPTIONAL_PREFIX \\
    --query 'Versions[?LastModified<=`2023-01-01`].[Key, VersionId]'

# Delete specific old versions (use with caution)
aws s3api delete-object \\
    --bucket YOUR_BUCKET_NAME \\
    --key OBJECT_KEY \\
    --version-id VERSION_ID

Key Insight: N2W provides intelligent storage tiering that automatically moves older backups to cost-effective storage without requiring complex lifecycle policies. The system tracks storage costs by backup policy, giving you clear visibility into which backup strategies are causing cost increases.

3. The “Completed with Issues” Mystery: Missing Files After Restore

Perhaps the most frustrating scenario is when AWS Backup completes with a status of “Completed with issues,” yet upon restoring the S3 bucket, you discover that some files are missing. This can lead to dangerous situations where you believe your data is protected when it’s not.

Understanding the Root Cause

Several factors can cause this issue:

Object-specific Permissions in S3 Restores: Some objects within your bucket might have unique permissions or conditions that prevent backup
AWS Backup Transient Errors: Network or service issues during backup might affect specific objects

The biggest challenge is that the backup notifications typically lack details about which specific objects failed to be included.

Let’s implement a comprehensive solution to identify and resolve these issues:

First, enable detailed SNS notifications for S3 backup object failures:

# Create an SNS topic for backup notifications
aws sns create-topic --name S3BackupNotifications

# Create a policy document for the topic
cat > topic-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "backup.amazonaws.com"
      },
      "Action": "SNS:Publish",
      "Resource": "arn:aws:sns:REGION:ACCOUNT_ID:S3BackupNotifications"
    }
  ]
}
EOF

# Apply the policy to the topic (replace with your AWS account ID and region)
aws sns set-topic-attributes \\
    --topic-arn arn:aws:sns:REGION:ACCOUNT_ID:S3BackupNotifications \\
    --attribute-name Policy \\
    --attribute-value file://topic-policy.json

# Subscribe your email to the topic (replace with your email)
aws sns subscribe \\
    --topic-arn arn:aws:sns:REGION:ACCOUNT_ID:S3BackupNotifications \\
    --protocol email \\
    --notification-endpoint your-email@example.com

# Configure backup vault notifications to use this topic
aws backup put-backup-vault-notifications \\
    --backup-vault-name YOUR_VAULT_NAME \\
    --sns-topic-arn arn:aws:sns:REGION:ACCOUNT_ID:S3BackupNotifications \\
    --backup-vault-events S3_BACKUP_OBJECT_FAILED S3_RESTORE_OBJECT_FAILED

Next, identify objects in Glacier or Deep Archive storage classes:

# Create a list of objects in GLACIER or DEEP_ARCHIVE
aws s3api list-object-versions \\
    --bucket YOUR_BUCKET_NAME \\
    --query 'Versions[?StorageClass==`GLACIER` || StorageClass==`DEEP_ARCHIVE`].[Key, VersionId, StorageClass]' \\
    --output json > glacier_objects.json

For objects in GLACIER that need to be backed up, initiate an S3 Glacier object restore:

# Create a script to restore each object (requires jq)
cat > restore_glacier_objects.sh << 'EOF'
#!/bin/bash
cat glacier_objects.json | jq -c '.[]' | while read -r obj; do
    key=$(echo $obj | jq -r '.[0]')
    version_id=$(echo $obj | jq -r '.[1]')
    echo "Restoring: $key (Version: $version_id)"
    aws s3api restore-object \\
        --bucket YOUR_BUCKET_NAME \\
        --key "$key" \\
        --version-id "$version_id" \\
        --restore-request '{"Days":7,"GlacierJobParameters":{"Tier":"Standard"}}'
done
EOF

chmod +x restore_glacier_objects.sh
./restore_glacier_objects.sh

To compare what’s in your bucket versus what was successfully backed up:

# Get a complete inventory of your bucket
aws s3api list-object-versions \\
    --bucket YOUR_BUCKET_NAME \\
    --query 'Versions[*].[Key, VersionId, Size]' \\
    --output json > bucket_inventory.json

# Count total objects
cat bucket_inventory.json | jq '. | length'

# Get recovery point details
aws backup get-recovery-point-restore-metadata \\
    --backup-vault-name YOUR_VAULT_NAME \\
    --recovery-point-arn YOUR_RECOVERY_POINT_ARN > recovery_point_metadata.json

# Check CloudTrail for specific backup errors
aws logs filter-log-events \\
    --log-group-name /aws/backup/YOUR_LOG_GROUP \\
    --filter-pattern "error"

For a comprehensive validation, perform a test restore to a new S3 bucket:

# Create a temporary bucket for validation
aws s3 mb s3://backup-validation-temp-bucket

# Start a restore job to the temporary bucket
aws backup start-restore-job \\
    --recovery-point-arn YOUR_RECOVERY_POINT_ARN \\
    --metadata '{"DestinationBucketName":"backup-validation-temp-bucket"}' \\
    --iam-role-arn arn:aws:iam::YOUR_ACCOUNT_ID:role/ROLE_NAME

# Compare original bucket with restored data
aws s3 ls s3://YOUR_BUCKET_NAME --recursive | wc -l
aws s3 ls s3://backup-validation-temp-bucket --recursive | wc -l

Finally, implement regular backup validation through automation:

cat > backup-validation.sh << 'EOF'
#!/bin/bash

# Get the latest recovery point
RECOVERY_POINT=$(aws backup list-recovery-points-by-backup-vault \\
    --backup-vault-name YOUR_VAULT_NAME \\
    --by-resource-arn arn:aws:s3:::YOUR_BUCKET_NAME \\
    --query 'RecoveryPoints[0].RecoveryPointArn' \\
    --output text)

# Create a timestamp for the validation bucket
TIMESTAMP=$(date +%Y%m%d%H%M%S)
VALIDATION_BUCKET="backup-validation-${TIMESTAMP}"

# Create validation bucket
aws s3 mb s3://${VALIDATION_BUCKET}

# Start restore job
JOB_ID=$(aws backup start-restore-job \\
    --recovery-point-arn ${RECOVERY_POINT} \\
    --metadata "{\\"DestinationBucketName\\":\\"${VALIDATION_BUCKET}\\"}" \\
    --iam-role-arn arn:aws:iam::YOUR_ACCOUNT_ID:role/ROLE_NAME \\
    --query 'RestoreJobId' \\
    --output text)

# Wait for restore job to complete
while true; do
    STATUS=$(aws backup describe-restore-job \\
        --restore-job-id ${JOB_ID} \\
        --query 'Status' \\
        --output text)

    if [ "$STATUS" = "COMPLETED" ]; then
        break
    elif [ "$STATUS" = "FAILED" ]; then
        echo "Restore failed!"
        exit 1
    fi

    echo "Waiting for restore to complete... Current status: $STATUS"
    sleep 60
done

# Compare original vs restored bucket
ORIGINAL_COUNT=$(aws s3 ls s3://YOUR_BUCKET_NAME --recursive | wc -l)
RESTORED_COUNT=$(aws s3 ls s3://${VALIDATION_BUCKET} --recursive | wc -l)

echo "Original bucket contains $ORIGINAL_COUNT objects"
echo "Restored bucket contains $RESTORED_COUNT objects"

if [ $ORIGINAL_COUNT -eq $RESTORED_COUNT ]; then
    echo "✅ Validation PASSED: All objects were successfully backed up"
else
    echo "❌ Validation FAILED: Mismatch in object count"
    DIFF=$((ORIGINAL_COUNT - RESTORED_COUNT))
    echo "$DIFF objects are missing from the backup"
fi

# Cleanup
echo "Cleaning up validation bucket..."
aws s3 rb s3://${VALIDATION_BUCKET} --force
EOF

chmod +x backup-validation.sh

Schedule this S3 backup validation script to run regularly to ensure your backups are complete and usable.

Best Practice: N2W provides Recovery Scenarios for scheduled testing of backups, though validation requires additional configuration. While the “Dry Run” feature helps test backup parameters, complete validation of file integrity requires manual verification or custom scripting. The interface simplifies scheduling these tests, but full automation of the validation process is not currently supported.

Cut Costs with N2W’s Automated Archival Tiering

N2W takes S3 backup management a step further with native data lifecycle capabilities. Automatically tier your backups, manage retention, and enforce policies without scripts. Your storage costs stay predictable and your data stays protected.

Try N2W today. Deploy in minutes and immediately archive your existing snapshots with hands-off, reliable S3 backup management.

Conclusion

These three issues—S3 IAM backup role permissions, S3 versioning cost optimization, and troubleshooting S3 backup restore issues—represent the most common and challenging problems with AWS S3 backups. By implementing the detailed CLI solutions provided, you can significantly improve the reliability and efficiency of your backup strategy.

Remember that while S3 itself offers impressive durability, a proper S3-specific backup strategy is still essential to protect against logical errors and ensure business continuity. Regular backup restore testing of your backups remains the gold standard for true data protection.

By methodically addressing these top three issues, you’re well on your way to mastering AWS S3 backups and achieving the peace of mind that comes with knowing your data is truly protected.

Troubleshooting The Most Common S3 Backup Issues: Step by Step

1. The Permission Puzzle: IAM Configuration Issues

Understanding the Root Cause

2. The Versioning Explosion: Storage Costs Going Wild

Understanding the Root Cause

3. The “Completed with Issues” Mystery: Missing Files After Restore

Understanding the Root Cause

Cut Costs with N2W’s Automated Archival Tiering

Conclusion

You might also like

AWS Data Lifecycle Manager: How It Works, Pros/Cons, and Best Practices

Inside Skechers’ Journey: Seamless, Cost-Effective Backup and DR with N2W

Major AWS Outages and 6 Tips for Surviving One

Get a demo

Get started

Try it free

Get started