Human Error and Organizational Resilience

Written by Jim Martin, CRSP, CUSP, CCPE on April 13, 2020. Posted in Leadership Development.

From 1980 through 2010, safety performance emphasis was on accident prevention through the application of controls. We learned about the hierarchy of controls (elimination, substitution, engineering controls, administrative controls and personal protective equipment) and the multiple barrier principle (use several controls in case one or two fail so there will always be something to protect you). The Institute of Nuclear Power Operations has defined “defense in depth” as “the overlapping capacity of various defenses to protect personnel and equipment from human error. If a failure occurs with one defense, another will compensate for that failure, thereby preventing harm. The four lines of defense – engineered, administrative, cultural, and oversight controls – should work together to anticipate, prevent, or catch active errors before suffering a significant event.” This thinking took us a long way in improving safety, and most companies experienced significant reductions in incident rates, severe accidents and fatalities.

During that period of time, and due to that success, most utility companies started to target zero injuries as part of their safety performance improvement programs. This led to an almost exclusive focus on a single number: the all injury rate or the total recordable injury rate. The result was that companies were able to achieve rates of less than 1.0 (one injury per 200,000 hours worked), which, in turn, led to the belief that they were ultra-safe organizations where nothing really bad could happen. But history has demonstrated that, even in those high-performing organizations, disasters and fatalities can and do still occur. As James Reason taught us in the 1990s through his Swiss cheese model, even multiple barriers can fail under the wrong circumstances, leading to accidents and loss.

The Deepwater Horizon oil spill in 2010 that resulted in 11 deaths is a terrible reminder of this fact. British Petroleum executives were aboard the Deepwater Horizon celebrating seven years without a lost-time accident when the disaster occurred. So, we have learned that just because a company has a good safety record, it does not mean that the company is completely safe.

I think we can all agree that workers being killed or seriously injured on the job is simply unacceptable. Many of us have come to believe that, given the tools, equipment, facilities, procedures and processes that we have available to us today, we should be able to work safely and without serious accidents.

Why, then, do these catastrophic accidents still occur?

Accident Causation and Blame Culture
For those same three decades – 1980 to 2010 – root cause analysis was the primary method of investigating incidents. This approach assumes that accident causes are linear in nature. That is, accidents are the result of a sequence of events that can be explained by asking “why” and answering “because.” We believed that we could explain accidents by working back from the event to the root of the problem. The difficulty with this approach is that it is influenced by something called hindsight bias. Thus, we can always find a way to explain why the accident should not have occurred; this has led to the belief that all accidents are preventable and to the zero-injuries approach to safety.

What this approach does not recognize is that the conditions and circumstances that exist at the time of the incident almost never line up with the hindsight view. It also does not recognize that the decisions and actions of workers at the time of the incident almost always make perfect sense in the situation and with the conditions that were present. When asked, “Why did you do what you did?” in an investigation, the worker almost always responds, “Because it seemed like a good idea at the time” or “I have been doing it this way for years and it always worked before.”

Putting all of this together, many organizations have come to believe that the only reason accidents still occur is because of the human factor – that it is the fault of the individual – and this belief leads to blame culture.

The above approach has come to be labeled “Safety-I thinking,” which assumes that accidents can be prevented through the use of controls that are designed to compensate for human failure. It was assumed that the reason accidents happen was largely due to the limitations of workers and their inability to perform work without making errors.

The reality is that workers do not come to work with the intention of having an incident. We now know that the behaviors that result in an incident are the same behaviors involved in successful work. The only difference is the outcome.

Our Adaptability and Biases
Our study of human evolution has taught us that we have evolved from our hunter-gatherer ancestors. Our ability to adapt, to recognize situations quickly and accurately, has allowed us to not only survive but to become the dominant species on the planet. Today, we are no longer functioning as hunter-gatherers in our modern social settings. We have modified our environment so much that the abilities that have evolved over time – ones that have allowed us to be successful – have now become disadvantageous in certain circumstances, resulting in behavioral mismatches.

The field of neuroscience provides evidence that the area of the brain known as the amygdala processes billions of bits of information each day. We have survived and become the dominant species on the planet because of our ability to almost instantly determine what all this information means. This fast response is partly due to the hardwired pathways in the brain that help us categorize the information into recognizable patterns that we call biases. Most of the time, these biases provide a correct interpretation of the information and allow us to make the right decision and take the correct action. But sometimes they don’t, and in these instances, we call it error.

Our biases generally allow us to be efficient in decision making and are almost always correct – until they are not. The last time I looked up “biases,” I found a list of 34 of them, and the list keeps growing. Of those, I have found the following four biases to be the most prevalent in incident situations.

1. Anchoring bias. We tend to be overly influenced by the first piece of information that we hear, a phenomenon referred to as the “anchoring bias” or “anchoring effect.”

2. Confirmation bias. This is the tendency to search for or interpret information in a way that confirms our preconceptions.

3. Pattern-matching bias. This bias is the tendency to sort and identify information based on prior experience or habit. Associated with rule-based behavior, it is the tendency for the mind to seek out recognizable patterns in an attempt to quickly find a solution.

4. Status quo bias. This is the tendency of people to like things to stay relatively the same.

Given what we now know about these biases and their effects on decision making and errors, what can we do to ensure that they don’t lead to accidents? Well, the answer is that we can do some things to help reduce the potential for error, such as implementing error reduction strategies like self-check, procedure use and adherence, and the use of three-way communication. But if we acknowledge that errors will be made and accidents will happen, we can do the things necessary to allow us to fail safely – and that is the definition of resilience.

Safety-II and Resilience
There is a new school of thought called Safety-II. The term was coined by Erik Hollnagel, a world-renowned professor and champion of the Resilience Engineering movement. According to Hollnagel, “Safety-I is freedom from unacceptable risk. … whereas Safety-II is the ability to succeed under varying conditions.”

Resilience is the capacity to respond, absorb, adapt and recover in a disruptive event. Here is an example: Many automotive manufacturers are now designing cars with the recognition that people will get into accidents. The cars are designed so that when an accident occurs, the people inside the vehicle will be protected from serious injury. This is achieved through a combination of the car’s ability to absorb the energy of impact along with the safety systems – such as seat belts and air bags – designed to hold the occupants in place. The underlying question being asked with this approach is, how well can we absorb the event and recover without a major loss or injury? The question is no longer, how do we prevent or avoid accidents?

Building Resilience
To embrace this new way of thinking and build resilience in our organizations, we need to do the following:

Stop focusing on reactive measures (lagging indicators) as the prime indicator of performance. If the only thing we do is look in the rearview mirror, we will never see what is coming at us and we will have catastrophes.
Stop blaming workers for accidents. Most incidents are the result of human error, and since error is unintentional and a normal part of being human, everyone commits errors, even the most skilled and experienced workers. Besides, blame is just a way to feel like we have done something without actually fixing anything.
Don’t throw out all of the good things we have been doing. Identifying hazards and putting controls in place to mitigate the risk are sound practices, and we need to keep doing them.
Continue to build defense in depth. Having multiple controls (barriers) in place will continue to help reduce the potential for incidents.
Plan for incidents to happen, not just how to prevent them. If we assume incidents will happen and put systems and processes in place so that significant loss can be prevented, then we have built resilience into our systems.

Conclusion
If we can all embrace this shift in thinking, we will get better at planning for success, learning from incidents and building the resilience necessary to eliminate fatal and serious injuries.

About the Author: Jim Martin, CRSP, CUSP, CCPE, is a health and safety professional and professional ergonomist with more than 38 years of experience in the electrical utility industry. He has worked in power-line construction as well as with plant operations and maintenance. Much of Martin’s knowledge and experience with human performance and human error stems from his years working as an ergonomist and with nuclear power systems.

APRIL-MAY 2020

LOOKING FOR SOMETHING?

Human Error and Organizational Resilience