Data quality root cause analysis – 5 whys

The importance of identifying and addressing the root cause of a data quality issue should never be overlooked. In this series of articles I will cover the most important techniques which help you uncover the root cause. This week will focus on the 5 whys, root cause analysis technique.


Iterative interrogative technique to determine the root cause of a particular issue.


Very popular in the world of lean development, this questioning technique keeps asking the “Why?” question to go deeper into the causes of the issue until it reaches the root cause. The answer to the fifth “why” usually uncovers a broken process, procedure, or policy.

Fun fact

Developed in the 1950s by Taiichi Ohno, the architect of the Toyota Production System. Ohno encouraged his team to dig into each problem that arose until the root cause was identified. He usually stated: “Ask ‘why’ five times about every matter.”

When to use

  • While conducting a workshop to identify the possible causes of a simple issue
  • If you need to isolate a single root cause, not multiple
  • When you can easily identify the stakeholders and subject matter experts tied to the issue
  • When you want to get an initial insight and a starting point for at least one cause of an issue


  • A simple tool which does not require training
  • Works well with other methods and techniques, such as the fishbone diagram
  • If you need to isolate a single root cause, not multiple
  • When you can easily identify the stakeholders and subject matter experts tied to the issue


  • The answers might not be repeatable if you go through the same exercise with other stakeholders
  • Most effective when answers come from stakeholders involved in at least one step of the process
  • There’s a tendency to single out one root cause, even if there might be multiple
  • Does not work well for complex problems – those need a more detailed analysis technique

Do you like working on data quality improvement projects? Here are the 3 data quality projects a data steward should work on.


Steps to develop it

1. Gather main stakeholders: Once the data quality issue is identified, identify the main stakeholders affected of the issue or taking part of the processes creating the issue. For example, if the issue is “there are too few customer emails”, the stakeholders might be: data stewards, IT, marketing, finance, etc. – depending on the processes through which this data is collected, maintained and disseminated.

2. Select a session leader: Each session should randomly select a leader in order to ask the 5 whys. You will see that asking this question more than once on the same issue can start to seem aggressive. The purpose of not always having the same leader is to defuse any potential tension as much as possible. All they need to do is ask the question and take notes. Alternatively they can designate someone else to take the notes. Sometimes a facilitator will be beneficial for some of the more difficult topics.

3. Ask “why?” five times: Each question might offer multiple answers. You can choose to go down deeper with the next “Why?” question into each of these answers or select the one it seems to be the biggest culprit. If the data quality issue persists, revisit the other answers in a future session. Make sure the answers are:

  • based on facts and knowledge
  • based on processes, not people – For example, you don’t want the answer to be “Because John does it that way.”

4. Determine solutions: Go through the answers of the deepest levels and come up with the corrective actions. Responsibilities will be assigned as part of your internal data quality stewardship and procedure.


Here is a 5 “Why?”s example identifying the root cause of why there are no more customer emails in your CRM database.

5 whys


  • You don’t need to stop at 5. You can ask “why?” a few more times until you got to the root of the problem
  • Don’t jump to conclusions once you hear each answer. Instead, move quickly to the next “why?”
  • Instead of the simple “why?” question, you can ask “why do you think this is happening?”
  • Pick stakeholders who know the process very well in order to get the best answers


There are no specific tools you need to use to document your finding as simple note-taking would be enough.  It is usually recommended to use a physical whiteboard and marker. If you identify multiple root causes, you can use the fishbone diagram to visualize them. Furthermore, you can use Pareto analysis to help identify the top portion of causes that need to be addressed.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

About the author 

George Firican

George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

You may also like:

George Firican


What is fault tree analysis?

What is fault tree analysis?