Estimating the cost of poor data quality in 5 steps

According to the IBM Big Data & Analytics Hub, poor data costs the US economy $3.1 trillion every year. Ovum Research reported that poor data can cost businesses at least 30% of revenues. These numbers are staggering and in my opinion, they should be enough to motivate any organization to start investing in data governance, but we know that’s not happening for everyone.

What is the cost of poor data for your organization?

As you might recall from the best data quality management trifecta, analyzing the business impact and cost of bad data is one of the first steps one should undertake before cleansing it and implementing data quality controls. Analyzing this cost can be a daunting undertake and it usually involves both qualitative and quantitative methods, but here are some simple steps to get you started. For the sake of simplicity, we assume the default time frame is a year, and that your organization only manages one data system.

estimating the cost of poor data quality

Step 1: Number of added and updated data

Estimate the number of new data entries your organization enters into your system as well as the number of data updates being made. This can usually be taken from audit logs or comparisons of data snapshots taken at different times.

Step 2: The error rate

Note the percentage of the data entries that contain errors. This can be estimate either from data quality logs, data profiling, or interviews and survey responses which were conducted during the situational analysis phase.

Step 3: Cost of poor data entry

This cost usually considers the allotted time to cleanse the data, the cost per hour of the position tasked with cleansing it and/or any licensing and servicing fees for 3rd party tools in place to cleanse this data. 3 categories should be considered in estimating the cost of poor data entry, each with its own criteria:

if caught on entry – the default cost range is is $0.5-$1
if cleansed once it’s in the system – the default cost range is $1-$10
if left unchecked – the default cost range is $10-$100. This can be quite high as it takes into account reputation erosion, risk of regulatory noncompliance, lost opportunity, misinformation to decision makers, etc.

Step 4: Percentage of detected poor data

For the 3 above categories, what is the percentage of data errors caught and corrected at each step? The three values should add up to 100%

Ex: caught on entry: 20%, cleansed in the system: 50%, left unchecked: 30%

Step 5: Get the total cost

Now it’s time to put it all together. Here is an example of the total cost calculations:

cost of poor data example

Conclusion

For more advanced calculations you can also include the rate of data decay. For customers records there is a natural data quality decay occurring even for those cleansed data points due to death, change of address, change of name, contact preferences, etc. As a rule of thumb, this is at least 10% of your entire customer database, and you can assume the same cost as for those records left unchecked.

Have you done any preliminary poor data quality costs? What other criteria did you consider?

[…] Entropy is expensive. According to a research done by IBM, poor data quality costs a staggering $3.1 trillion to the U.S. economy annually, not to mention […]

[…] Entropy is dear. In response to a analysis executed by IBM, poor knowledge high quality prices a staggering $3.1 trillion to the U.S. economic system yearly, […]

[…] Entropy is pricey. Based on a analysis achieved by IBM, poor knowledge high quality prices a staggering $3.1 trillion to the U.S. financial system yearly, […]

[…] Entropy is dear. In keeping with a analysis achieved by IBM, poor knowledge high quality prices a staggering $3.1 trillion to the U.S. financial system yearly, […]

[…] Entropy is expensive. According to a research done by IBM, poor data quality costs a staggering $3.1 trillion to the U.S. economy annually, not to mention […]

Share0

Tweet0

About the author

George Firican

George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
tve_leads_unique	1 month	This cookie is set by the provider Thrive Themes. This cookie is used to know which optin form the visitor has filled out when subscribing a newsletter.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_1Z635JPV9L	2 years	This cookie is installed by Google Analytics.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AE_AB_COOKIE	1 year	No description
DEVICE_INFO	5 months 27 days	No description
loglevel	never	No description available.
tl_4829_4830_26	1 month	No description
tl_4829_4840_30	1 month	No description
tl_4829_4941_41	1 month	No description
tve_secret	1 year	No description available.

Estimating the cost of poor data quality in 5 steps

What is the cost of poor data for your organization?

Step 1: Number of added and updated data

Step 2: The error rate

Step 3: Cost of poor data entry

Step 4: Percentage of detected poor data

Step 5: Get the total cost

Conclusion

George Firican

Human in the Loop AI: Why It’s Often Just a Checkbox

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling

Optimizing retail operations through a practical data strategy

You may also like:

Data management risk register – Free template

How to use the barrier analysis for improved data quality

The ultimate Terms of Reference template for data governance council