Tech Manager—Fail Safe
No one wants to think about failing, but I think about it when needed. Failing is part of winning. Most major league baseball players never hit over a .300 batting average. Last year Jeff McNeil of the New York Mets, batted .326. He was an All-Star, Silver Slugger and MVP candidate, one of the best in the Majors. But he did not make it on base most of the time. In fact, most players “fail” to get a hit seven out of ten times at bat. And they make millions of dollars per year.
So, let’s think about failing a bit. It has happened in the past and it will happen in the future. You may have heard the adage… “Fail to plan, plan to fail”. But how about failing to plan for failure? Forgetting that might cause great pain and delay. The Cambridge Dictionary lists fail-safe as, “If something is fail-safe, it has been designed so that if one part of it does not work, the whole thing does not become dangerous.” I used to work in Process Design, and we always specified control valves to fail-safe. Sometimes they failed open, and others failed closed. You decided which is best. You would not want oil, gas or chemicals to be gushing all over the place is a valve fails.
Some of you may have seen the movie “Fail Safe”. The plotline has a computer malfunction that triggers an order for a nuclear attack. Tension rises as bombers are scrambled and they try to countermand the order. It is a great Cold War movie from the 60’s. You should watch it.
Fail Safe usually is thought of as being automatic. If something bad happens, then this automatically happens. But in technology planning, automatic seldom happens unless it is coded in. you can “code in” fail safe. First you think about failures that might occur, expected or unexpected. Then you decide what should be done when the failure actually happens. No one wants to fail, and many forget to plan for it.
Planning That You Never Want to Use
This type of planning is called many things. It may be called a Contingency plan, Backout plan, Backup plan or Rollback plan. Any way you title it, I am talking about what you do when things go off track and are irreparable, unsalvageable, and wrong. You have seen this happen. Something that should be easy, does not go so easy. Something that has worked a hundred times, goes wrong on attempt one hundred and one. It may be a small thing, or a major project. I want to focus on the major derailments. The ones that need to be thought through before you begin the initiative. With smaller ones, you just try again. Or you do some troubleshooting and then try again. But the major ones sometimes go off track and can’t be salvaged. What do you do then? You need to add some time to the positive planning phases to think about the negative side when things may go wrong. You may even want to delay your initiative if your back out plan is not ready yet. It is worth the risk to start an initiative when there is no escape hatch?
Define what might need to be done if things do not go as planned. Discuss what the back out strategy might be. Is it simply starting over with an adjusted plan? Is it stepping back the software rollout? Is it updating other hardware and swapping out what you were trying to upgrade? And how long will it take to execute your back out plan? If you are approaching a deadline for use of the hardware or software you are working on, can you get it operational again by stepping back and get it usable again by the deadline? You need to get agreement on the back out plan and timeline before you start the initiative.
That way everyone knows that failing is just a temporary setback and not total failure. You have a plan for contingency that can be used in a worst-case scenario.
The Point of No Return
Sometimes your initiative has a point of no return. Fail safe planning takes that into account and has a decision point for going beyond the point of no return. When you approach and then pass the no return point, everyone should know you are all in. Failing after that point, means not having the hardware or software functional for an indeterminate period of time. Sometimes you have no choice. You have to move forward. Sometimes you just need to press on.
What Does Failure Look Like?
A Back Out plan, assuming that when the failure has happened, it is unrecoverable. It is when you are stuck. Dead in the water. No possible hope is left for making progress and you have to turn around. This is not the decision to go to Plan B. Plan B might be part of your rollout as an option to the optimal Plan A. Plan A and Plan B, or even C might work and you can divert to another path when A or B stalls. But an “abort” is a call that is made when everything is failing. Then you have to put things back the way they were before you started.
Make the Call – Abort
Time to abandon ship. I am using every metaphor or simile I can to define that the last vestiges of hope are gone. Time to call it quits. Abort. Pull the rip cord. Stop the music. When this happens, you divert to the back out strategy and try to put Humpty Dumpty back together again. When you find that the options list is exhausted and there seem to be no more legitimate ideas to solve the dilemma and get back on track, make the call. Don’t chew up too much time with valiant attempts that can’t pan out. The best plan now is to recover, regroup and retry later. There is no shame in failing when you have made a good plan and execute it well. Somethings just do not go your way. Give it some time, think it through again and reinitiate later. Tomorrow is another day (I am always quoting movies).