AKS Backup: Basics, Velero Tutorial, and Key Considerations

There are two primary approaches for backing up AKS: the native option and the third-party option. In this post we walk through both.
Share post:

Backing Up Azure Kubernetes Service 

Backing up Azure Kubernetes Service (AKS) is essential for protecting stateful applications and persistent data in production environments. Unlike stateless workloads, AKS deployments often include databases, configuration files, and persistent volumes that are critical to service continuity. 

Losing these components due to accidental deletion, misconfiguration, or cluster failure can lead to significant downtime and data loss. However, backup in Kubernetes is more complex than traditional infrastructure because cluster state is distributed across various resources and storage backends.

There are two primary approaches for backing up AKS: 

  • The native option integrates directly with Azure Backup and offers managed support for snapshot-based backups, backup policies, and long-term retention using storage tiers. It’s suitable for teams looking for simplicity and tight integration with Azure services. 
  • The open source tool Velero provides flexible, Kubernetes-native backup that supports multiple cloud providers. It allows fine-grained control over backup scope, scheduling, and recovery workflows. Velero is especially useful for teams needing cluster portability, multi-cloud strategies, or backup of non-Azure Kubernetes clusters.

In this article:

How Does AKS Backup Work? 

Native AKS Backup

AKS backup protects and recovers workloads and persistent data in Azure Kubernetes Service clusters. To use this feature, you must install the backup extension directly inside the AKS cluster. This extension communicates with a backup vault, which manages backup and restore operations. Without this extension, backup and restore functionality is not available.

Configuration details

When AKS backup is configured, you must specify a storage account and a blob container where the backup data will be stored. An extension identity is automatically created in the managed resource group of the cluster. This identity is granted the Storage Account Contributor role, which allows it to write backup data to the specified storage account.

Trusted access must also be enabled between the AKS cluster and the backup vault. This feature gives the backup vault the permissions required to access and manage backup operations across the cluster, whether it is public, private, or IP-restricted.

Storage tiers and workflow

AKS backup supports two storage tiers: The operational tier stores backups as snapshots within your tenant for recovery, and the vault tier stores backups as blobs outside your tenant for long-term retention and cross-region restore.

Backups are first created in the operational tier. Once per day, one recovery point, usually the first successful backup of the day, can be moved to the vault tier. Vault tier backups support restoration in secondary Azure regions.

You can configure backup policies that define the schedule and scope of backups. These can target specific namespaces or the entire cluster. During restore, you can recover data either to the original cluster or to another cluster within the same subscription and region.

Persistent volumes are backed up using disk snapshots, which are stored in a separate snapshot resource group. These disk snapshots, combined with the saved cluster state in blob storage, form the recovery point. This setup ensures that application data and cluster configurations can be restored.

AKS Backup with Velero

Velero is an open-source tool that provides backup and restore capabilities for Kubernetes clusters, including Azure Kubernetes Service. It works by capturing cluster resources and persistent volume data, then storing them in external object storage. In AKS environments, Azure Blob Storage is typically used as the backup location. Velero is useful when teams need portable backups or want more control than the native AKS backup solution provides.

Deploying the Velero server

To use Velero with AKS, you deploy the Velero server inside the cluster and configure it with an Azure storage account and blob container. Velero uses a service principal or managed identity to authenticate with Azure and write backup data to the storage location. Once configured, Velero can back up Kubernetes objects such as pods, services, deployments, and config maps.

For persistent volume data, Velero supports two approaches. It can create disk snapshots through the Azure disk snapshot API, or it can copy volume data using a file-level backup tool such as restic or node-agent. Snapshot-based backups are faster and work well with Azure managed disks, while file-level backups provide more flexibility for other storage types.

Backup triggers and recovery

Backups can be triggered manually or scheduled using Velero backup policies. These policies define what resources to include and how frequently backups should run. Velero stores metadata about the cluster state along with references to persistent volume backups, allowing complete application environments to be recreated.

During recovery, Velero restores Kubernetes resources and reattaches persistent volumes from snapshots or file backups. Restores can target the same AKS cluster or a different cluster, which makes Velero useful for migration, disaster recovery, and cluster upgrades. This portability is one of the main reasons teams adopt Velero for Kubernetes backup strategies.

Tutorial: Backing Up AKS with Velero 

Backing up an AKS cluster with Velero involves installing the Velero CLI, configuring Azure storage for backups, deploying Velero to the cluster, and creating backup and restore operations. These instructions are based on a Microsoft Build How-To Guide.

1. Install Required Tools

Start by installing the Azure CLI and Chocolatey on your machine. Chocolatey can be used to install the Velero CLI on Windows.

Open PowerShell as an administrator and sign in to Azure:

az login --use-device-code

Install the Velero CLI:

choco install velero

If needed, switch to the Azure subscription where backups will be stored:

az account set -s <SUBSCRIPTION_ID>

2. Create Azure Storage for Backups

Velero stores backups in Azure Blob Storage. Create a resource group, storage account, and blob container to hold the backup data.

Create a resource group:

az group create -n Velero_Backups --location WestUS

Create the storage account:

az storage account create \
 --name <STORAGE_ACCOUNT_NAME> \
 --resource-group Velero_Backups \
 --sku Standard_GRS \
 --kind BlobStorage \
 --access-tier Hot \
 --https-only true

Create a blob container for Velero backups:

az storage container create \
 -n velero \
 --public-access off \
 --account-name <STORAGE_ACCOUNT_NAME>

3. Create a Service Principal

Velero requires Azure credentials to access storage and manage snapshots. Create a service principal with appropriate permissions:

az ad sp create-for-rbac \
 --name "velero" \
 --role "Contributor" \
 --scopes /subscriptions/<SUBSCRIPTION_ID>

Record the generated client secret. It is shown only once.

Retrieve the client ID:

az ad sp list --display-name "velero" --query '[0].appId' -o tsv

Then create a credentials file (credentials-velero.txt) containing the required variables:

AZURE_SUBSCRIPTION_ID=<SUBSCRIPTION_ID>

AZURE_TENANT_ID=<TENANT_ID>

AZURE_CLIENT_ID=<CLIENT_ID>

AZURE_CLIENT_SECRET=<CLIENT_SECRET>

AZURE_RESOURCE_GROUP=Velero_Backups

AZURE_CLOUD_NAME=AzurePublicCloud

4. Install Velero in the AKS Cluster

Deploy Velero into the cluster and configure it to use Azure Blob Storage.

velero install \
 --provider azure \
 --plugins velero/velero-plugin-for-microsoft-azure:v1.5.0 \
 --bucket velero \
 --secret-file ./credentials-velero.txt \
 --backup-location-config resourceGroup=Velero_Backups,storageAccount=<STORAGE_ACCOUNT_NAME> \
 --use-restic

This command creates a velero namespace and deploys the Velero service in the cluster.

Verify that the Velero pods are running:

kubectl -n velero get pods
kubectl logs deployment/velero -n velero

5. Create a Backup

Velero can back up the entire cluster or selected namespaces.

Backup all namespaces:

velero backup create <BACKUP_NAME> --default-volumes-to-restic

Backup a single namespace:

velero backup create <BACKUP_NAME> --include-namespaces <NAMESPACE> --default-volumes-to-restic

Check backup status:

velero backup describe <BACKUP_NAME>

Backup files appear in the configured Azure Blob Storage container.

6. Restore From a Backup

To restore a backup, first deploy Velero in the destination cluster using the same credentials and storage configuration.

Confirm the backup is available:

velero backup describe <BACKUP_NAME>

Restore the cluster resources and persistent volumes:

velero restore create --from-backup <BACKUP_NAME>

Velero recreates Kubernetes objects and reconnects persistent volumes from the stored backup data.

Azure Kubernetes Service Backup Limitations 

Limitations of Native AKS Backup

Azure Backup for AKS offers tight integration with the Azure ecosystem, providing a managed solution for protecting AKS clusters. However, administrators must carefully consider its numerous restrictions when planning their protection strategies, as they can significantly impact what data is protected, how long it is retained, and which cluster configurations are supported. 

Key limitations:

  • Storage and disk support is limited. Only persistent volumes based on Azure Disks using the CSI driver are supported. Unsupported storage types, such as Azure File Shares, Azure Blob Storage, and Azure Container Storage, are skipped. Specific Azure Disk SKUs, including Premium SSD v2 and Ultra Disks, are excluded.
  • Node pool requirements restrict deployment. The backup extension is only functional on x86-based Linux node pools running Ubuntu or Azure Linux. It cannot be installed on Windows node pools or ARM64 nodes.
  • Networking requires specific configuration. Network-isolated AKS clusters are not natively supported, requiring private endpoints to be configured for backup and restore operations. The extension also mandates a general-purpose v2 storage account in the same region as the cluster, configured with either public access or trusted access.
  • Operational and recovery constraints exist. Backups in the operational tier are crash consistent, meaning they are not guaranteed to align across volumes at the same instant. Vault tier backups are limited to one recovery point per day, resulting in a recovery point objective (RPO) of 24 hours in the primary region and up to 36 hours with cross-region restore enabled. The tiering process to the vault can take up to four hours, and hydrated staging resources must be manually deleted.
  • Certain namespaces are excluded from protection. The kube-system, kube-node-lease, and kube-public namespaces are automatically excluded from the backup scope.
  • Configuration changes and coexistence are restricted. Modifying the backup configuration or the assigned snapshot resource group after initial setup is unsupported. Native AKS backup cannot be used alongside Velero or Velero-based solutions, and resources must not use labels or annotations with the velero.io prefix to avoid conflicts.
  • Resource and feature limits apply. Each backup vault is limited to 5,000 policies and 5,000 backup instances. A single backup instance supports up to 800 namespaces, 10 on-demand backups per day, and 10 restores per day.
  • Version mismatch can cause failures. Mismatches between the cluster version at backup time and restore time may cause failures or warnings, particularly if resources have been deprecated in newer Kubernetes versions. Vault-tier restores rely on hydrated staging resources for recovery in these cases, and vault-tier backups are not supported through Terraform deployment.

Limitations of AKS Backup with Velero

Although Velero is widely used for Kubernetes backup and recovery, it has several limitations that administrators should consider when using it with Azure Kubernetes Service. Velero primarily backs up Kubernetes resources through the Kubernetes API and relies on storage-provider snapshots or file-level backups for persistent volumes. 

Because of this design, backups may not always capture a perfectly consistent point-in-time state across all components of an application, especially when using file-system–based backups. In such cases, data is copied from live file systems, which can lead to inconsistencies if application data changes during the backup process.

Key limitations:

  • Point-in-time consistency is not guaranteed for file-level backups. Velero primarily uses the Kubernetes API for resource backup, relying on storage-provider snapshots or file-level backups for persistent volumes. When using file-system-based backups, the process involves copying data from live file systems, which can result in inconsistent data captures if the application data is changing during the backup.
  • External resources are not automatically protected. Velero does not automatically protect resources or data that reside outside the Kubernetes cluster. External components such as message queues or databases must be backed up using their own dedicated tools.
  • Some cluster-level resources require manual effort. Cluster-level resources like certain infrastructure components, RBAC configurations, or admission controllers may need additional manual steps or separate tools to ensure they are fully captured in the backup.
  • Operations rely heavily on command-line and manual configuration. Administrators must define backup locations, schedules, and restore procedures using command-line interface (CLI) commands or YAML configuration. In complex Kubernetes environments, this requires significant operational effort and careful scripting to maintain reliable policies.
  • Performance and resource consumption can be a concern. File-level backups require the Velero node-agent pods to directly access volume file systems, often necessitating elevated privileges. This introduces potential security considerations and increases resource overhead (CPU, memory, network) within the cluster.
  • Compatibility must be managed carefully. Administrators must carefully manage the compatibility between various component versions, including Velero, Kubernetes, and the storage plugins. Not all version combinations are fully tested, requiring validation before upgrading clusters or backup components.

Best Practices for Successful AKS Backup 

Use Scheduled Velero Backups with Retention Policies

Manual backups are useful for testing, but production AKS environments should rely on scheduled backups. Velero supports scheduled backup policies that automatically run backups at defined intervals, such as hourly or daily. This ensures that cluster state and persistent volumes are captured regularly without requiring manual intervention.

Retention policies should also be defined to control how long backups are stored. Without retention rules, backup storage can grow quickly and increase costs. By configuring expiration periods in Velero schedules, old backups are automatically removed while recent recovery points remain available for disaster recovery or troubleshooting.

Prefer CSI Volume Snapshots for Persistent Volumes

When backing up persistent volumes in AKS, CSI-based volume snapshots are typically faster and more efficient than file-level backups. CSI snapshots operate at the storage layer and capture the entire disk state in a short time. This reduces backup duration and lowers the load placed on cluster nodes during backup operations.

File-level backups using restic or node-agent copy data through the file system, which requires additional CPU, memory, and network resources. While this approach is useful for unsupported storage types, snapshot-based backups should be the default choice for Azure managed disks because they provide better performance and simpler restore operations.

Store Backups in External Object Storage

Backups should be stored outside the AKS cluster to protect against cluster failures or accidental deletion. External object storage, such as Azure Blob Storage, provides durable and scalable storage for backup data. Storing backups externally ensures they remain accessible even if the original cluster becomes unavailable.

Using external storage also improves portability. Backup data stored in object storage can be used to restore workloads into new clusters, different environments, or recovery regions. This approach supports migration scenarios, cluster upgrades, and disaster recovery strategies.

Use Vault Tier for Long-Term Retention and Disaster Recovery

The vault tier in Azure Backup provides durable storage for long-term AKS backup retention. While the operational tier allows quick recovery from recent snapshots, the vault tier stores backup data as blobs outside the cluster tenant. This design protects backups from cluster-level failures or configuration errors.

Vault tier backups also support cross-region restore capabilities. If a regional outage occurs, backup data can be restored in a paired Azure region. Organizations that require strong disaster recovery strategies should move at least one daily recovery point into the vault tier for long-term protection.

Learn more in our detailed guide to AWS disaster recovery

Monitor Velero Backup Jobs and Logs

Backup operations should be monitored regularly to ensure they complete successfully. Velero provides commands such as velero backup describe and velero backup logs to check backup status, view warnings, and identify failures. Monitoring helps detect issues such as failed snapshots, storage permission errors, or incomplete backups.

In production environments, Velero logs and metrics should also be integrated with monitoring tools such as Azure Monitor, Prometheus, or centralized logging platforms. Automated alerts can notify administrators if backups fail or exceed expected durations. Continuous monitoring ensures that recovery points remain valid and usable when needed.

Simplifying Backup & Disaster Recovery with N2W

With N2W, Azure customers get easy backup and disaster recovery whether staying within Azure or extending across their AWS workloads. They have complete visibility under one single console, without needing an extra siloed cloud team, with full control of their data and without needing any additional licensing fees. VMs and disks stay in your own Azure account, costs stay low with automatic Blob Tier management (Hot, Cool and Cold tiers), and recovery is rapid and streamlined with cross-region, cross-account, and cross-cloud DR. Customers also benefit from flexible point-in-time recovery as frequent as every five minutes. There is zero manual configuration needed and no proprietary lock-in.

Recent concerns given the number of outages and ransomware threats mean that customers need a full-proof way to protect their resources without going over their budget. N2W gives you the power to stay on with future-forward features such as:

âś… Pro tip: Download the Cloud Outage Survival Guide and see how to rapidly restore your environment, no matter where your backups live and no matter what happens in Azure.

You might also like