By now you have likely heard about, or worse yet, been impacted by the glitch that crippled Delta Airline’s network and reservations system on Monday that forced them to cancel about 1,000 flights worldwide. Delta has stated that a power control module malfunctioned, causing a surge that cut off power to their main computer network. Normally, the systems would switch to backup computer systems almost instantaneously, however in this case something didn’t go right. Confidentiality, Integrity, and Availability (CIA) are the foundational cornerstones of information security, and in this case, availability was on the wrong flight path. It is safe to say that this problem, which will ultimately cost the airline millions of dollars, could have been avoided through scenario planning.
While Delta has flapped their wings mightily to get everything back online – this outage is a good example of an important security issue…and the impacts not managing all aspects of it can have.
Availability in the world of InfoSec is similar to that in the airline industry – there are things that we can control (mechanical functionality of aircraft, obviously) and things we can’t control (i.e weather, and potentially attackers taking advantage of zero-day exploits). As industries continue to increase reliance on computers, systems, and networks, there is an exponential increase in complexity, which in turn creates multiple opportunities for things to go awry.
For example, in the airline industry, computers and systems are used for everything from reservations to meals to in-flight entertainment. In addition, many airline companies have grown through acquisition and therefore are challenged in connecting disparate systems and networks. With each merger and acquisition, the business must be able to absorb and reconcile the disparate systems, people and processes to ensure they are working in concert.
Don’t Get Stuck in the Middle Seat
Despite all of the technical advances, many airlines still rely on antiquated systems because the act of migrating them to more current platforms could introduce significant risk to the organization. Many of these systems and platforms worked well for their initial intended purposes, but as the business landscape evolved, businesses had to implement tweaks to try to get these systems to communicate with each other, oftentimes at the cost of security degradation. In addition, oftentimes these custom applications are developed in house and not well documented. However, this should not be an excuse to avoid upgrading, but it should be an element of performing a risk assessment and demonstrating due diligence in properly vetting that risk and understanding how it may impact the organization. Utilizing application development best practices like I describe here is a great way to head off many of these concerns.
Bad things happen to good companies. While it seems that a power outage event would be included anyone’s BC/DR plan, this unfortunate incident shows that some scenarios either weren’t played out, were far too remote in likelihood to consider, or were JOOTT (just one of those things). Your Business Continuity plan should, amongst other things, cover various scenarios associated with power failures, including a redundant sources of power, UPS (uninterruptible power supply) battery back-up, back-up power generators, and of course, failover capabilities. Along with architecting these features into your ecosystems, you also should define and perform test scenarios to ensure that your power back-up solution works as advertised! It is unfortunate that a simple power outage crippled the organization and caused millions of dollars of losses along with brand damage.
What’s your Flight Plan?
But it’s more than that. As your environment and ecosystems change over time it’s easy to miss vulnerabilities and weaknesses that spring up as a result of these changes, no matter how granular your change management and SDLC processes are – many small seemingly inconsequential changes may ultimately add up to changes that could significantly impact your organization. That is why it is critical to perform periodic risk assessments to determine how various outages and failures may impact the overall environment. This takes meticulous planning and a clear understanding of your data flows, networks, systems, applications, and data stores. How they are interconnected and their availability impacts each element. You can check out one of my older posts on risk assessment here.
I’m Leaving on a Jet Plane
This unfortunate outage could have been avoided, and at a minimum should have been an accepted risk. We don’t know if that’s the case or not. But, this could have happened to anyone, and as a matter of fact, almost all of the major airlines have suffered computer outages, in one form or another, over the years. Take heart, learn from others’ misfortunes. Take measures to prevent them from happening to your organization. Continue to practice good security hygiene by performing periodic risk assessments to identify potential issues and the impact. Make sure that your BC/DR plan addresses all of the reasonable scenarios you can conjure up as you certainly don’t want to wing it here.
Concerned that your organization could use a little help identifying and mitigating risk? Drop me a line and we can continue the conversation.