Using a gap analysis to reduce system downtime for business continuity

Using a gap analysis to reduce system downtime for business continuity

What exactly is business continuity? For a long time, I thought of business continuity as being a subset of my disaster recovery plans. If there was a disaster, I would launch my business continuity plan in order to recover from the disaster.

The Real Niel
Niel Nickolaisen

Then, one of my CIO peers told me about his study of system downtime.

This CIO did an in-depth analysis of the root cause of each of his system downtime incidents. He traced each of the incidents all the way back to the reason for the downtime. His root cause analysis showed that over 70% of his system downtime was self-inflicted by his IT department.

For example, someone might make an untested or unvalidated change to a production system and temporarily bring the system down. Someone might deploy a new version of custom code that conflicts with the production version of the database. Someone might wonder where the other end of that power cord leads and decide to pull the cord from the outlet. This CIO quickly figured out that he could make a 70% improvement in system uptime if his IT staff just stopped doing things that brought down the production systems. Best of all, these improvements were completely within his and his staff's control.

After my friend shared his results with me, business continuity took on a new meaning. Rather than being a part of my disaster recovery plans, continuous

    Requires Free Membership to View

    Download Enterprise CIO Decisions for free after registering.

    After registering we will email you the latest issue as well as access to our archive of back issues. Get essential editorial insights that senior IT executives need to run IT operations effectively and efficiently.

    Get Enterprise CIO Decisions Now!

    By submitting your registration information to SearchCIO-MidMarket.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchCIO-MidMarket.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

business operations should be my standard mode of operation. I still need the ability to recover from a disaster, but that is a subset of my ability to run a credible, reliable IT department.

Setting a new goal of continuous business operations, I then performed my own analysis. Not an analysis of the reasons for our system downtime but a gap analysis of our IT processes. I was looking for process weaknesses that would result in us not testing a patch before we applied it, in someone not knowing where the power cord was connected, in someone not knowing about an incompatibility between development tools and the database.

For my gap analysis, I decided to find the person in IT who knew the least about technology. I commissioned that person to "walk" our various processes and identify the holes that the uninitiated might accidentally fall through. Fortunately, I had the perfect uninformed person -- me!

I did not launch grueling, exhaustive process reviews. Instead, I asked people to describe the process for applying a patch, moving a code change into production, verifying changes, etc. From these descriptions, our analysis consisted of three parts:

  • First, the purpose of the gap analysis was to see how our processes could help people succeed. It wasn't to blame people -- blaming people rarely results in process improvements.
  • Second, what holes in the process did we need to fill to reduce the opportunity for mistakes?
  • Third, how could we simplify the process so everyone could understand and follow it? (I learned a long time ago that the more complex the process, the more likely the mistakes.)

As we implemented these process improvements, our business continuity increased. We still have unintended downtime, but it's rare and each time it happens, we have the opportunity to further refine our IT processes for change management, project management, service management, communication, etc.

And should we ever have a disaster to recover from, we have a tight set of processes that will help us recover better and faster.

Niel Nickolaisen is CIO at Western Governors University in Salt Lake City. He is a frequent speaker, presenter and writer on IT's dual role enabling strategy and delivering operational excellence. Write to him at nnick@wgu.edu.

This was first published in July 2009

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.

    Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.