Even the safest among us will occasionally veer off the road to safety excellence and encounter an incident. When this happens, the best management practice is to identify and correct the cause(s) so that you can get back on track and avoid future mishaps.
The process we employ for doing this is called Incident Analysis. At the heart of Incident Analysis is the problem-solving method known as Root Cause Analysis. Root Cause Analysis, like a bastion on the road to safety excellence, is the most effective method available for identifying and fortifying gaps in your safety defenses and preventing future incidents.
The term Root Cause Analysis has various meanings. In the broadest sense, it refers to the overall incident analysis process. In a narrower sense, it refers to the various methodologies used for identifying the causal factors. In the narrowest sense, it refers to the step in a Causal Factor Analysis that takes you beyond the surface causes to discover the systemic causes.
In the April 2009 edition of “Incident Prevention,” in an article titled, “Incident Analysis: Beyond the Surface,” I suggested a two-phase incident analysis process. The first phase is a Primary Event Analysis. This phase determines what happened (i.e., the primary event and its consequences) and how it happened (i.e., the direct or immediate cause). The second phase is a Causal Factor Analysis, which determines why the incident happened. It starts with listing the indirect or proximate causes, and then performing a Root Cause Analysis to determine the underlying or approximate causes, and the root or ultimate causes (see Figure 1: Incident Analysis Process).
Another way to visualize the process is in three parts – a Consequence Analysis, Surface Cause Analysis and Root Cause Analysis. Both of these conceptual frameworks distinguish Root Cause Analysis as a component of the overall process, placing emphasis on going beyond the surface to identify the systemic causes.
The objective of a Root Cause Analysis is to peel away the surface and get to the core of the problem. When you address only the surface causes, the likelihood of recurrence is high. When you address the root causes, the likelihood of recurrence is significantly reduced. The deeper you go in the causal chain, the greater the sphere of influence your corrective actions will have on improving overall safety. Systemic causes result in systemic failures; systemic solutions result in systemic improvements. You get the biggest return on your investment when you identify and correct the root causes.
A root cause is like that first step down the proverbial “slippery slope” that will eventually and inevitably lead to an incident. It is the fundamental or basal cause in a causal chain. Root causes can be systems-based, either a systems design or implementation failure. Failing to develop or poorly designing applicable safety programs will have a negative and widespread effect on safety. Failing to implement or inadequately enforcing them will have a similar effect (see Tables A and B).
Tables A and B
Root causes can also be cultural-based. The organization’s safety culture is like an energy field, either positively or negatively charged, that influences how people think, feel and act within the organization regarding safety. On the visible level, culture is displayed in the conditions and behaviors of an organization. On the invisible level, culture is comprised of the attitudes and beliefs of the organization, and even deeper, its values and norms.
There are a variety of methodologies available to help identify root causes. Some methodologies are better suited for specific applications, but most can be universally applied. It is not uncommon, especially for serious incidents, to use more than one methodology in the Incident Analysis process. Becoming familiar with these methodologies will greatly enhance your Root Cause Analysis skills.
Root Cause Analysis Methodologies
•Ishikawa “Fishbone” Diagram
•Apollo Root Cause Analysis
•Events & Causal Factor Analysis
•Multi-Linear Events Sequencing
•Fault Tree Analysis
•Management Oversight Risk Tree
•Event Tree Analysis
•Safety Management Oversight Risk Tree
•Human Factors Analysis Classification System
For many incidents, a simple flowchart can be used to organize the process and display the results of a Root Cause Analysis. The analysis begins with a Primary Event Analysis, including a statement of the direct cause in terms of both an unsafe condition and an unsafe action. The analysis continues with a Causal Factor Analysis that identifies various indirect causes and performs a Root Cause Analysis on them. This analysis uses the Why-Why technique, which simply asks the question “why” to develop the causal chains down to their root causes. The causal factors shown are supported by evidence obtained during the investigation (see Figure 2: Root Cause Analysis Flowchart).
Although the analysis could be more extensive, it shows just how much information can be obtained and displayed with a minimal amount of effort. It reveals a number of gaps in safety defenses, including human, systems, engineering and cultural-based failures. This provides the basis for determining how to close those gaps and fortify your safety defenses.
Some people believe that there is a single root cause associated with an incident. They argue that if that one cause had not occurred, the incident would not have occurred. Therefore, you only need to analyze long enough to find that single cause and then address it to prevent future incidents.
Most safety professionals agree that there can be several root causes associated with an incident. Some believe that at least 10 or more causal factors come together to cause a serious injury. Others believe that 27 factors, on average, directly and indirectly contribute to serious accidents. Catastrophic accidents can have hundreds of factors that directly or indirectly contribute to them. Although the degree of causality might be different for each factor, prudence suggests that they all contribute in some degree and are worthy of consideration for corrective action.
Logically, you could pursue causal chains indefinitely or at least back to the beginning of time. However, eventually your efforts will reach a point of diminishing return. How do you know where that point is? Answering the question “why” five times is usually considered sufficient. The following guidelines can help determine when to stop analyzing a causal chain:
•When you reach fundamental systems or cultural causes
•When you reach a cause that you cannot influence
•When other causal paths are more productive
•When causal paths are duplicated or can be combined
•When you lack the necessary information or knowledge to continue
•When the facts do not support a potential cause
Carrying the causal chain out too far is non-productive at worst, while stopping it too soon is detrimental at best. Too often people stop at a convenient or “most likely” cause, one that is easy to correct and will satisfy the expectations of upper management. This can result in missing key underlying causes or overlooking entire causal chains.
One example of this is stopping at the "failure to follow procedures" level. In most instances, this type of failure is indicative of underlying problems associated with safety systems designed to ensure compliance. If the expectation for the analysis is set too low, the root cause level may not be reached. This will result in underlying system defects not being corrected, allowing them to contribute to future incidents.
Root Cause Analysis requires a good understanding of incident causation theories, as well as safety management principles, so you can know the potential causes to look for. You also need to keep an open mind regarding where you will find those causes. You must being willing to look just as hard at your safety systems and culture as you do the actions of employees because one of the reasons for developing effective safety systems and cultivating a positive safety culture is to provide a defense against human error.
The secret to being successful at Root Cause Analysis is being passionate about it. If you possess a burning desire to protect the health and safety of your employees, and to prevent future incidents, then you will succeed at doing so.
About the Author: Donald F. Fritz has more than 25 years of experience in the utility industry, and has been managing EH&S full time for the past 12 years. He is a Certified Safety Professional and a Certified Safety and Health Manager and a Professional Member of the American Society of Safety Engineers (ASSE).