Estimating the cost of poor data quality in 5 steps

According to the IBM Big Data & Analytics Hub, poor data costs the US economy $3.1 trillion every year. Ovum Research reported that poor data can cost businesses at least 30% of revenues. These numbers are staggering and in my opinion, they should be enough to motivate any organization to start investing in data governance, but we know that’s not happening for everyone.

What is the cost of poor data for your organization?

As you might recall from the best data quality management trifecta, analyzing the business impact and cost of bad data is one of the first steps one should undertake before cleansing it and implementing data quality controls. Analyzing this cost can be a daunting undertake and it usually involves both qualitative and quantitative methods, but here are some simple steps to get you started. For the sake of simplicity, we assume the default time frame is a year, and that your organization only manages one data system.

estimating the cost of poor data quality

Step 1: Number of added and updated data

Estimate the number of new data entries your organization enters into your system as well as the number of data updates being made. This can usually be taken from audit logs or comparisons of data snapshots taken at different times.

Step 2:  The error rate

Note the percentage of the data entries that contain errors. This can be estimate either from data quality logs, data profiling, or interviews and survey responses which were conducted during the situational analysis phase.

Step 3: Cost of poor data entry

This cost usually considers the allotted time to cleanse the data, the cost per hour of the position tasked with cleansing it and/or any licensing and servicing fees for 3rd party tools in place to cleanse this data. 3 categories should be considered in estimating the cost of poor data entry, each with its own criteria:

  1. if caught on entry – the default cost range is is $0.5-$1
  2. if cleansed once it’s in the system – the default cost range is $1-$10
  3. if left unchecked Рthe default cost range is $10-$100. This can be quite high as it takes into account reputation erosion, risk of regulatory noncompliance, lost opportunity, misinformation to decision makers,  etc.

Step 4: Percentage of detected poor data

For the 3 above categories, what is the percentage of data errors caught and corrected at each step? The three values should add up to 100%

Ex: caught on entry: 20%, cleansed in the system: 50%, left unchecked: 30%

Step 5: Get the total cost

Now it’s time to put it all together. Here is an example of the total cost calculations:

cost of poor data example


For more advanced  calculations you can also include the rate of data decay. For customers records there is a natural data quality decay occurring even for those cleansed data points due to death, change of address, change of name, contact preferences, etc. As a rule of thumb, this is at least 10% of your entire customer database, and you can assume the same cost as for those records left unchecked.

Have you done any preliminary poor data quality costs? What other criteria did you consider?

%d bloggers like this: