In the 6 weeks of my course, Data Quality Improvement at BCIT, I try to impart to them an understanding of how to ‘discover’ the instances of poor data quality in the data sets they are responsible for so that they can begin to help their team to ‘monetize’ the effects of that poor data quality. Each organization is different and so there are no ‘out of the box’ solutions that will fit everyone, so I try to show my students how they can use some specific tools and techniques to support the cost benefit analyses behind enabling continuous improvement in data quality.

Joseph Juran, a pioneer of quality who defined quality as fitness for use and something that satisfies customer needs, said that:

“In the USA, about a third of what we do consists of redoing work previously ‘done’.”

joseph juran quote

Juran’s stark ‘one third’ statistic came from his understanding of the inefficiencies built into the average organization’s business processes, and to a large extent many of those inefficiencies can be reduced by use of process improvement methods such as Theory Of Constraints, Lean and Six Sigma. Those same techniques, initially developed for manufacturers for improving business processes, should also be considered for the improving the ‘manufacture of data’ in today’s organizations. If you think about it, most organizations are not so much focused on the manufacture of widgets – rather they are focused on the management of the decisions based the data produced by their business processes.

information quality appliedJuran’s ‘one third’ statistic is congruent with the chapter on the costs of poor data quality in Larry English’s “Information Quality Applied”, where English details the statistical analyses illustrating that the costs of poor data quality are between 20% and 35% of the operating revenue of the average organization.

If the data produced by an organization’s business processes does not support their decision making, then their decisions will be sub-optimal. However, if the data can be cajoled into supporting the critical decisions but always has to first be cleaned up, massaged and filtered before it is ‘safe for decision making’, then the organization is forced into an inefficient ‘data gathering and cleansing’ cycle that could actually be building into a dangerous time lag between when the time the decisions should be made, and when the data required to make those decisions is finally clean enough to support the key decisions.


Here’s how you can estimate the cost of poor data quality in 5 simple steps.


 

IBM puts the cost of poor data quality in the USA at $3.1 Trillion dollars per year, which is roughly 15% of the USA GDP of $20 Trillion. To paraphrase Everett Dirksen “A trillion here, a trillion there, and pretty soon you’re talking about real money.”

IBM data quality costs

Data quality the accuracy dimensionSo, it looks like the ‘ball park’ guesstimate for how much poor data quality is costing your company is between 15% and 35% of operating revenue, if your company is in the average range. Does that seem like a reasonable amount of money to waste? Is it reasonable to waste any money due to poor data quality? Jack Olson, in his book “Data Quality: The Accuracy Dimension”, says that roughly half of the costs due to poor data quality can be recovered or mitigated. The other half is irrecoverable due to the continuous evolution of data sets, interfaces and technologies at a local and industrial level. Olson’s book is the only one Ralph Kimball has reviewed on Amazon, and after giving it a 5 out of 5, Kimball says “This book is on my very short list of essential reading for data warehouse professionals.”

So, without sounding facetious, can I ask a simple question? If you have not been keeping metrics on the quality of data in your organization, should you consider publishing a note to advise the stakeholders and/or shareholders of your organization that, due to lack of measurement, you do not know how much our poor data quality is costing them. Sounds extreme doesn’t it. But we have all been hearing ideas around valuing organizations data and reporting it on regular basis. It just might happen. Look at the perturbances from Europe’s General Data Protection Regulation (GDPR).

According to the March 2017 MIT Sloan article, many companies do not know the answer to the question “What is Your Data Worth?”, but it seems reasonable to assume that if the data is of poor quality, that the data is literally worth less than if it were of pristine quality. And we should really do everything we can to keep those two words ‘worth’ and ‘less’ from combining when an auditor assesses the value of our data!

  • Milan Kucera says:

    Dear Gordon, thank you very much for this article focuses on a costs (impacts) of poor quality information (data). I agree with you and for me this part is the most important area of an information quality management. It is in a compliance with the P. Crosby statement ” Quality is measured via costs and not indexes”.
    I did a few exercises at this area and I can confirm the general principle presented by P. Crosby (paraphrase): if you are investigate the costs of poor quality for the first time you find a 1/3 of the costs of poor quality. My exercise shows that his view is correct as I have found ~1/3 of the total costs of poor quality. I think only money are important for management and can lead to a decision to focus on information quality.
    To identify those cost I use the Larry English TIQM.P3 process that I have enhanced on other models how to identify costs and compare given results among given results based on different approaches.

    Finally there is anybody who opens this important topic.

    • Gordon Hamilton says:

      Hi Milan, thank you for your insightful comments on this important topic. I am just a DQ teacher, and I really appreciate your years of experience implementing and measuring these costs. Your comments illustrate some really useful ideas and techniques that should share more widely. Would you consider writing a followup article on this topic to share from the LightsOnData.com blog? I’d be happy to help if you think that would facilitate. Cheers, Gordon

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    About the author 

    Gordon Hamilton

    Gordon Hamilton is an unapologetic Data Enthusiast, with years of experience integrating data quality into the data warehouse development process, modeling dimensional and 3NF data models, leading data migrations, analyzing data and helping customers find the signal in the noise. In his downtime, Gordon teaches Data Quality Improvement at BCIT, helps his former students ease Data Governance into their organizations and supports the new DAMA Vancouver Chapter. GHamilton@DataQuality.ca

    You may also like:

    >