How much does poor data quality cost your organization?

In the 6 weeks of my course, Data Quality Improvement at BCIT, I try to impart to them an understanding of how to ‘discover’ the instances of poor data quality in the data sets they are responsible for so that they can begin to help their team to ‘monetize’ the effects of that poor data quality. Each organization is different and so there are no ‘out of the box’ solutions that will fit everyone, so I try to show my students how they can use some specific tools and techniques to support the cost benefit analyses behind enabling continuous improvement in data quality.

Joseph Juran, a pioneer of quality who defined quality as fitness for use and something that satisfies customer needs, said that:

“In the USA, about a third of what we do consists of redoing work previously ‘done’.”

joseph juran quote

Juran’s stark ‘one third’ statistic came from his understanding of the inefficiencies built into the average organization’s business processes, and to a large extent many of those inefficiencies can be reduced by use of process improvement methods such as Theory Of Constraints, Lean and Six Sigma. Those same techniques, initially developed for manufacturers for improving business processes, should also be considered for the improving the ‘manufacture of data’ in today’s organizations. If you think about it, most organizations are not so much focused on the manufacture of widgets – rather they are focused on the management of the decisions based the data produced by their business processes.

Juran’s ‘one third’ statistic is congruent with the chapter on the costs of poor data quality in Larry English’s “Information Quality Applied”, where English details the statistical analyses illustrating that the costs of poor data quality are between 20% and 35% of the operating revenue of the average organization.

If the data produced by an organization’s business processes does not support their decision making, then their decisions will be sub-optimal. However, if the data can be cajoled into supporting the critical decisions but always has to first be cleaned up, massaged and filtered before it is ‘safe for decision making’, then the organization is forced into an inefficient ‘data gathering and cleansing’ cycle that could actually be building into a dangerous time lag between when the time the decisions should be made, and when the data required to make those decisions is finally clean enough to support the key decisions.

Here’s how you can estimate the cost of poor data quality in 5 simple steps.

IBM puts the cost of poor data quality in the USA at $3.1 Trillion dollars per year, which is roughly 15% of the USA GDP of $20 Trillion. To paraphrase Everett Dirksen “A trillion here, a trillion there, and pretty soon you’re talking about real money.”

IBM data quality costs

So, it looks like the ‘ball park’ guesstimate for how much poor data quality is costing your company is between 15% and 35% of operating revenue, if your company is in the average range. Does that seem like a reasonable amount of money to waste? Is it reasonable to waste any money due to poor data quality? Jack Olson, in his book “Data Quality: The Accuracy Dimension”, says that roughly half of the costs due to poor data quality can be recovered or mitigated. The other half is irrecoverable due to the continuous evolution of data sets, interfaces and technologies at a local and industrial level. Olson’s book is the only one Ralph Kimball has reviewed on Amazon, and after giving it a 5 out of 5, Kimball says “This book is on my very short list of essential reading for data warehouse professionals.”

So, without sounding facetious, can I ask a simple question? If you have not been keeping metrics on the quality of data in your organization, should you consider publishing a note to advise the stakeholders and/or shareholders of your organization that, due to lack of measurement, you do not know how much our poor data quality is costing them. Sounds extreme doesn’t it. But we have all been hearing ideas around valuing organizations data and reporting it on regular basis. It just might happen. Look at the perturbances from Europe’s General Data Protection Regulation (GDPR).

According to the March 2017 MIT Sloan article, many companies do not know the answer to the question “What is Your Data Worth?”, but it seems reasonable to assume that if the data is of poor quality, that the data is literally worth less than if it were of pristine quality. And we should really do everything we can to keep those two words ‘worth’ and ‘less’ from combining when an auditor assesses the value of our data!

Dear Gordon, thank you very much for this article focuses on a costs (impacts) of poor quality information (data). I agree with you and for me this part is the most important area of an information quality management. It is in a compliance with the P. Crosby statement ” Quality is measured via costs and not indexes”.
I did a few exercises at this area and I can confirm the general principle presented by P. Crosby (paraphrase): if you are investigate the costs of poor quality for the first time you find a 1/3 of the costs of poor quality. My exercise shows that his view is correct as I have found ~1/3 of the total costs of poor quality. I think only money are important for management and can lead to a decision to focus on information quality.
To identify those cost I use the Larry English TIQM.P3 process that I have enhanced on other models how to identify costs and compare given results among given results based on different approaches.

Finally there is anybody who opens this important topic.

Gordon Hamilton says:
at
Hi Milan, thank you for your insightful comments on this important topic. I am just a DQ teacher, and I really appreciate your years of experience implementing and measuring these costs. Your comments illustrate some really useful ideas and techniques that should share more widely. Would you consider writing a followup article on this topic to share from the LightsOnData.com blog? I’d be happy to help if you think that would facilitate. Cheers, Gordon
Reply

Share0

Tweet0

About the author

Gordon Hamilton

Gordon Hamilton is an unapologetic Data Enthusiast, with years of experience integrating data quality into the data warehouse development process, modeling dimensional and 3NF data models, leading data migrations, analyzing data and helping customers find the signal in the noise. In his downtime, Gordon teaches Data Quality Improvement at BCIT, helps his former students ease Data Governance into their organizations and supports the new DAMA Vancouver Chapter. GHamilton@DataQuality.ca

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
tve_leads_unique	1 month	This cookie is set by the provider Thrive Themes. This cookie is used to know which optin form the visitor has filled out when subscribing a newsletter.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_1Z635JPV9L	2 years	This cookie is installed by Google Analytics.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AE_AB_COOKIE	1 year	No description
DEVICE_INFO	5 months 27 days	No description
loglevel	never	No description available.
tl_4829_4830_26	1 month	No description
tl_4829_4840_30	1 month	No description
tl_4829_4941_41	1 month	No description
tve_secret	1 year	No description available.

How much does poor data quality cost your organization?

Here’s how you can estimate the cost of poor data quality in 5 simple steps.

Gordon Hamilton

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling

Optimizing retail operations through a practical data strategy

Transforming Marketing Data into Business Growth: Key Insights and Strategies

The future of generative AI’s form factor

You may also like:

How to use the barrier analysis for improved data quality

How much does poor data quality cost your organization?

Do executives underestimate the costs of poor data quality?