The ultimate guide to a Data Quality issues log

If you’ve already started or planning to start a data governance program to support your data quality improvement goals, you need a structured way of tracking your data quality issues and their status. There are different ways of doing this, of course, with either the help of dedicated data quality tools, or incident management and ticketing systems, knowledge sharing or intranet platforms such as SharePoint, or even a simple Excel file.

Spoiler: I’m offering a free template of a data quality issues log at the end of this article.

Depending on the tools, the overall scope of the data quality initiatives, and the organization itself, you can track the data quality issues in multiple ways. I tracked them in different ways throughout my career, either as data incidents, as part of a status reporting model, project based issue tracking, or a dedicated data quality issues management log. Recently I was having a conversation on the topic at a data quality event, and I was asked:

What is your ultimate guide to a data quality issues log?

Well, regardless of the tool being used to create this log, here are the data quality issues log attributes I typically include (split into 3 categories):

1. Issue details

ID: A unique ID is always a great idea to have for putting together any inventory in order to reference data quality issues quicker between technical staff, business analysts and business users.

Name of issue/ Title: Even if this is obvious, recording a short title for your data quality issue is important because it’s usually what the business users will reference and provide them with a quick overview into the issue itself without having to read all of its details to figure that out.

Detailed description: Any details to offer further context and insight into where the issue was found, what system(s), processes, reports, etc. are known to be affected before an in-depth analysis is done.

Status: Use this field to track how many data quality issues have been identified and submitted, are in progress or resolved. I recommend using the following options: backlog (initial status of an issue), assigned (when resources have been identified and assigned), in progress, testing, closed/resolved, and on hold.

Date raised/ Date added: This date field helps you keep track of when data issues are submitted, which can help you identify how long an issue remains unresolved.

Target resolution date: Use this date field to track when the issue needs to be resolved by based on any dependencies it might have (ex: another technical project, business process redesign, report deployment, etc.). This date can be a good indicator of the risk status.

Importance: This drop-down field will help you prioritize the issue log items and sometimes determine the target resolution date value. I use the following categories: critical, high, medium, low, though it’s up to you to decide how you are defining them.

Category: This element is dependent on your data governance and data quality models, but try to find a meaningful way to categorize your issues either by data governance areas or data quality measures, or both. You can always add sub-categories and note that you can have multiple ones for the same entry. As an example, a data quality issue can be categorized under timeliness, accuracy, and no standards. You can always add these as you go along.

2. Resources & Ownership

Business unit: This is not a mandatory field, but recommended to be used as it helps you understand the resources being spent in resolving data quality issues which have ownership from a particular business unit. Of course, there are also issues which are owned at the organization level.

Business owner: Who has sign-off authority when an issue is resolved and who decides on what the business rules are to which the data quality standards and requirements need to conform to?

Business analyst: Ideally this type of resource exists in your organization as their skills are crucial in order to help identify the business needs, understand the technical limitations, and figure out the root cause of the data quality issue.

Technical resource: Person(s) tasked with implementing the technical solution (ex: modifying the metadata, updating the user interface, implementing controls, creating audits, etc.), performing data profiling, cleansing the data, and so on.

Testing lead: Even though the technical resources should always have someone to help them test their work, this field is meant to track the testing resource from the business end, who have knowledge over this data and the business rules.

3. Final Resolution

Root cause: Determining the root cause will not just help you identify the fix, but also prevent it from happening again. Note the details of what the reason for the issue is, which could be lack of ownership, no clearly defined standards, no audits, no data validation, technical limitations, lack of training, incorrect or no definitions, incorrect or ambiguous business process, and lots more.

Resolution details: What had to be done in order to fix and prevent this data quality issue from happening again? Use this as a reference point when similar log entries are being added.

Completed Date: Not all submissions get to be completed by the desired target date. Hopefully they get resolved earlier, but some get resolved later. Calculating the difference between the target date and completed date will yield some interesting measures and might help your case to acquire further resources

Status report notes: This stands as its own area, but as you progress with your analysis, resolution, and testing, keep track of what has been done in the past week, each week, so that the business owner can refer to it.

A very important aspect to all the fields listed above is consistency. Ensure you’re tracking the dates in the same format, that you have guidelines on when you should fill out the status report notes (ideally after each change, summarized by week), that you have a standard on the tiles (ex: only capitalize the first word), that you always name your resources in the same way (with first and last name, for example).

As I mentioned before, there are many ways of tracking a data quality issue log, but this is my ultimate guide. What else do you think should be included?

Free template:

Data Quality Issue Log

Download “Data Quality Issue Log Template” DQ-Issue-Log.xlsx – Downloaded 11383 times – 20.62 KB

Veggie

We appreciate this work. It is usable. It will be good if an example is added to appreciate detail that could be added under each column

One thing I always include as a separate attribute is ‘Business Impact’ as this will help inform priorities if there are many issues, plus it’s a way to link value to data.

Thanks for the template and the article. It is a great starting point!

Suggestion for templates: Include a couple of good examples (filled out rows) to help the reader

This is very helpful to get started with, thanks. Too bad we can’t also download an accompanying DG team of people to implement and maintain this for us, plus all the data cleanup that goes with it! Thanks for the xlsx.

Share0

Tweet0

About the author

George Firican

George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
tve_leads_unique	1 month	This cookie is set by the provider Thrive Themes. This cookie is used to know which optin form the visitor has filled out when subscribing a newsletter.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_1Z635JPV9L	2 years	This cookie is installed by Google Analytics.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AE_AB_COOKIE	1 year	No description
DEVICE_INFO	5 months 27 days	No description
loglevel	never	No description available.
tl_4829_4830_26	1 month	No description
tl_4829_4840_30	1 month	No description
tl_4829_4941_41	1 month	No description
tve_secret	1 year	No description available.

The ultimate guide to a Data Quality issues log

Spoiler: I’m offering a free template of a data quality issues log at the end of this article.

1. Issue details

2. Resources & Ownership

3. Final Resolution

Free template:

George Firican

Human in the Loop AI: Why It’s Often Just a Checkbox

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling

Optimizing retail operations through a practical data strategy

You may also like:

Data management risk register – Free template

ETL test automation for DWH and BI – free white paper

The ultimate Terms of Reference template for data governance council