An End To Downtime: Learning From The IT Failure At British Airways

Share This Post
An End To Downtime: Learning From The IT Failure At British Airways | N2WS Blog

The British Airways execs and staff may not want to hear this, but the May 27 incident—the entire collapse of their computing system—did not have to happen. Major companies, like British Airways, have outgrown technology that keeps them at the mercy of power surges, human error, and other technological hiccups. With a resilient environment and professional disaster management, corporate systems can work seamlessly and avoid the high costs of downtime. The events at British Airways offer a lesson on why companies should invest the time and money into migrating to the cloud now before a disaster takes place.

What Happened at British Airways?

British Airways explained that the worldwide computer shutdown was caused by a power outage in one of their IT hubs. Although they had a backup system ready to go, it did not kick in, leading to thousands of canceled flights and ruined travel plans for over 75,000 people. They implemented their disaster recovery plan, only to have it fail when disaster hit. Imagine the panic in the British Airways offices as minutes of downtime turned into hours. This is not to make light of what happened: it must have been a genuinely awful experience, especially since it appears that the entire situation was the mistake of one technician. However, to place the blame solely on human error is disingenuous. British Airways’ infrastructure was an accident waiting to happen. As a major airline with a wide IT infrastructure, British Airways made a mistake by maintaining their equipment in-house instead of updating to a cloud-based infrastructure with automated disaster recovery. We can speculate, as many journalists have, whether this was the result of cost-cutting policies or not, but the reasoning is immaterial. Other airlines have made the move to the cloud and are benefiting from it. Banking on a dated IT system and an in-house disaster recovery team cannot offer modern companies the kind of uninterrupted functioning their systems require.

The Problem is Not Unique

This is far from the first example of its kind. Outdated IT plagues the airlines industry. British Airways suffered a previous catastrophe several years ago. Other airlines have also fallen victim to the same problem. The failure of a single router cost Southwestern $54 million for 12 hours of downtime. Delta was forced to cancel hundreds of flights when their IT system went down due to a power outage, costing them a cool $150 million. These are just some examples of established companies that patched together old IT systems rather than migrating to a truly resilient IT infrastructure with modern disaster recovery practices. Indeed, in the Southwestern incident, the Pilots Association named the outdated infrastructure and management’s unwillingness to update that infrastructure as the causes of the disaster. And while companies may make this choice out of a belief that it is too costly to upgrade their infrastructure, the cost of ignoring this need may be greater. It is estimated that the human error of one technician at British Airways, coupled with the homegrown disaster script failing in real time, cost British Airways tens of millions of pounds, in addition to a 3% dip in share prices and the truly immeasurable loss in reputation. This is just a drop in the bucket. According to Dunn and Bradsteet, about 56% of Fortune 500 companies experience 1.6 hours of downtime A WEEK, which they conservatively estimate can cost a company $46 million in lost labor costs a year. Companies also retain their old infrastructure to avoid losing the time it takes to upgrade. But, as recent events prove, it may be more time-consuming NOT to make that change.

What About Disaster Preparation?

On that fateful Saturday, when British Airways screens went dark, everyone was asking: why don’t they have a backup? Why isn’t their disaster recovery plan kicking in and getting them up and running again? The thing is, British Airways had a backup. They had invested in a disaster recovery plan. It just did not work. This is an all-too-common occurrence in these catastrophic IT events. Corporations are patching IT systems running on traditional, on-premises infrastructure and creating homegrown disaster scripts which are notoriously prone to human error. This type of infrastructure is simply not reliable for today’s business environment. The flight plans of hundreds of thousands of people should not be at the mercy of a single person’s miscalculation.

Automation For the Win

Corporate IT systems require automation for maximum reliability. The future of data and application storage is on the cloud and the most secure backup infrastructure enables automated backup creation. This is the service that N2WS Cloud Protection Manager (CPM) provides. CPM is extremely resilient because it is engineered with disaster recovery in mind. With automated backups set to perform as often as your organization requires, you can minimize the possibility of data loss, and you can test the smooth recovery operation, before a disaster hits. From a management standpoint, CPM is easy to use, with an intuitive interface. CPM takes advantage of company knowledge, leaving no room for human error. Everything is automated and set up in advance. When you need your backups, they can be installed at the touch of a button.

Controlling What We Can

With the proliferation of malware, ransomware, and IT outages, it is not really a question of “if” a disaster will strike your IT system, but when. In order to be a reliable steward of your company’s IT, you must have a plan. While it’s not possible to control all the events that will happen to your company, you can control how you react to them. As the British Airways debacle shows, the time to move towards a highly-resilient cloud deployment is now. Cases like this are a reminder that an ounce of prevention is worth a pound of cure. It is simply much cheaper to plan for the worst-case scenario.

Next step

The easier way to recover cloud workloads

Allowed us to save over $1 million in the management of AWS EBS snapshots...

Try N2WS for Free