‘Fail’ doesn’t have to be a four-letter word

Projects fail. Companies crash and burn. Screws fall out all the time; the world is an imperfect place. Just because it happens doesn’t mean we can’t do our best to prevent it or—at the very least—to minimize the damage when it does. As a matter of fact, embracing failure can be one of the best things you do for your organization.

First, let’s start with a few extreme examples of how things can go wrong…

A grim roll call

  • Despite warnings from many engineers, the space shuttle Challenger was allowed to launch with a faulty design on its oxygen tanks. The thinking, paraphrased, was “it will only be dangerous in these specific and unlikely circumstances…” Those circumstances, unfortunately, occurred.
  • A stuck valve, a signal light known to be ambiguous and lack of training led to the near meltdown of the TMI-2 cooling tower at Three Mile Island, to date still one of the worst nuclear disasters in history.
  • The Deepwater Horizon oil rig, off the coast of Louisiana, exploded in 2010 due to a number of factors: A cementing procedure was not followed correctly, allowing gas to escape from the well…on a windless day when a man was welding nearby.

Each of these tragedies has something in common: it could have been avoided. In each case there were known problems or procedures which were ignored and in each lives were unnecessarily lost. Each was the result of a compounding of small issues and oversights which unfortunately combined in just the right proportion at just the right (or wrong) time, small issues which were insignificant…until they weren’t.

Most likely lives are not on the line in your business but that doesn’t make your project failures feel any less personally and professionally devastating at times. Like these three tragedies, it’s likely that your failures can also be avoided.

Before suggesting how to avoid project failures, we should take some time to understand some of the factors which lead to them in the first place. There are many reasons why a project might fail, but two of the biggies are Culture and Cognitive Biases.

Cause #1: Culture

Do you like failing? No, nor do I. In general, humans do not like to fail. If things don’t go as expected we tend to get very cranky and sometimes irrational. This is unfortunate as failure is one of the most effective teachers we have.

Many project cultures are openly hostile to failure in any degree. In these cultures, backing or working on a project which doesn’t succeed as forecast can be very unhealthy for one’s career. Mistakes—even honest ones—are punished. This frequently leads to errors and oversights being swept under the carpet rather than being brought to light and addressed promptly. As we’ve seen in the examples above, ignoring the small things can have devastating results.

People in these cultures go to great lengths to avoid association with failure. Frequently, they’ll shelve a new and potentially profitable endeavor rather than take the chance it may fail. This risk aversion stymies innovation and company growth. As well, when a new project is pursued and is not going as planned the people associated with it will not report this underperformance. Doing so is often perceived (at best) as rocking the boat or (at worst) hitching yourself to the failing project. Therefore the project chugs along, devouring company resources, rather than being cut as it should be. All because of a culture which is hostile to failure.

Cause #2: Cognitive Biases

We all come with mental software packages which help us cope with life, allowing us to form judgments and take actions in situations both mundane and exceptional. Usually these work as expected and everything is just fine. But then there is the edge case, where the software doesn’t work quite as it should. This is a cognitive bias, “…a pattern of deviation in judgment that occurs in particular situations, which may sometimes lead to perceptual distortion, inaccurate judgment, illogical interpretation, or what is broadly called irrationality.

While many cognitive biases have been identified, there are four which most commonly contribute to instances of failure. You will all recognize them:

  • Normalization of deviance: The tendency to accept anomalies as normal. “Yeah, we get that error message all the time. Nothing’s ever blown up so we just ignore it now.”
  • Outcome bias: The tendency to focus on the outcome rather than the complex process leading up to it. “Sure, some things went wrong during the code deploy but the product shipped and isn’t that what really matters?”
  • Confirmation bias: The tendency to favor evidence which will support existing beliefs. “No, see, I know Windows would suck for this purpose because I read this article by RMS…”
  • Fundamental attribution error: The tendency to put too much blame for failure on personal rather than situational factors. “We fired the out-sourcers because they were hacks who didn’t know anything.” (Actual: the out-sourcers weren’t given the information and support required for success)

Suggestions

The possible causes for project failure are too varied and unpredictable to be able to avoid them all. However, given what we know it’s possible to minimize the impact and occurence of many potential failures.

First and foremost, it’s necessary to change the project and/or company culture so that it encourages people to report problems, no matter the size. The saying of this is, of course, much easier than the doing. Changing a culture is difficult but, at least in this case, very worthwhile. You can only solve those problems you know. The ones which go unreported end up costing dearly in money, time, effort and team morale.

Any cultural change must have honest and complete support at the highest levels of the organization. An effort like this can only succeed with commitment and leadership. Management must understand why the change is needed and lead by example. Only then will other members of the team feel comfortable enough to pull back the curtain and expose the problems which are holding the organization back.

Leadership should also learn to celebrate rather than blame those who bring errors, problems and failures into the light. Such actions should be seen for what they are: successfully preventing the waste of vital resources. Most people in tech have heard the phrase, “Fail early, fail often.” Few live by those words or reward those who do.

Many organizations will perform a postmortem on a project which has failed. While useful, these processes don’t even show half of the picture. The best way to identify issues which may put future projects at risk is to postmortem all projects, successful or not. Thanks to outcome bias, a near-miss will look like a success and typically will then escape a postmortem scrutiny. This leaves a raft of issues unidentified, ready to appear and compound in future efforts.

But even this won’t help you identify potential problems when you need to. While locating and anaylzing them after they’ve occured is, of course, valuable, wouldn’t it be better to spot them before they become real problems in the first place? Part of the cultural changes which must occur is an organization-wide commitment to spot and report potential problems as quickly as possible. This is more likely to happen if people are asked to use simple and familiar tools to make the reports. Rather than re-invent the wheel or roll out yet another tool for team members to learn, I suggest that the organization’s existing issue tracking system be used for this purpose. After all, while these problems may not be software bugs they are issues which need resolving. Simply create a new database/project/what have you in your issue tracker then open it up to everyone in the organization. It does little good to report issues if people are not able to see them, after all.

Reporting the issues isn’t enough. If people believe that the reports go into a black hole, where they are never viewed let alone addressed, they will stop creating them and you’ll be right back where you started. These issues should be analyzed to determine the root causes for them. When resolving a problem, all too often people will scratch the surface and leave it at that. “This build failed because $team_member didn’t follow the procedure.” That’s not a reason for the problem; that’s a symptom. The true reason is buried somewhere in there. Why did $team_member not follow the procedure? Is the procedure wrong? Was there some exception? Was $team_member getting undue pressure to work faster, causing corners to be cut? Did $team_member not know that the procedure exists or that it should be used in this situation?

The statement, “This build failed because $team_member didn’t follow the procedure” is a presumptive and possibly incorrect placement of blame. As you can see, it is much more likely the case that the system within which $team_member is working is flawed than that this person is incompetent. The urge to point fingers is strong, but a vital part of the cultural change required is to discourage laying blame without doing analysis. Take an hour to speak with everyone involved to figure out the true underlying reasons for the error. Never punish but in the cases of gross negligence or repeated incompetence. Let people know it’s not only OK to make mistakes, it’s entirely natural to the human condition. Not only will you find that people start reporting and addressing issues before they become real problems, you’ll also find that team morale improves as people realize that they’re respected and trusted.

Once you’ve identified the underlying reason(s), make sure to correct and communicate. It does no good to analyze a problem if you’re not going to do something with what you discover. Fix the procedure. Provide additional training. Add a new Nagios warning. Modify the schedule to allow time to do things right. Bring in a specialist. Do whatever needs to be done to be sure the issue never happens again. And, of course, it’s vital that you communicate not only the necessary fixes but why they were necessary in the first place. Team meetings are a good way to do this but don’t forget that issue tracker. Any resolution should be captured here for posterity. Just like adding comments to a complicated piece of code, Future You will be grateful if you document both the problem and its solution.

Other Resources

As I mentioned above, there is a myriad of ways in which a project can fail. This article only starts to address that and some of the things you can do to prevent or minimize damage. If you’re interested in learning more, here are some of the resources I’ve stored up on the subject. Most of them are from HBR, so you may need to pay to access them. You can also probably get them from your local public or university library.