4 myths about Data Quality everyone thinks are true

Myth #1: It’s all about fixing the data

Data cleansing is a very important aspect of improving data quality, but it’s not the only one. In order to have a sustainable data quality program you can’t just fix the data. You need to understand what needs to be fixed and why; analyze the root cause of the issues and address any findings; understand your data environment and inter-dependencies; identify the data owners, stewards, and custodians. You also have to profile the data and not just understand the business logic by which data gets created, maintained, or consumed, but also how do these conflict with your technical constraints. Prevention methods in the form of data entry validations, regular data quality audits, clear ownership and definitions, understood business and technical processes are also needed to sustain the needed level of your data quality. For details on all of these, please read “The trifecta of the best data quality management“.

Myth #2: It’s a one time project

I’ve seen a lot of organizations do this, they throw money at a project meant to improve a particular set of data for a particular purpose (ex: physical addresses for a particular publication or appeal they need to send). The big issue is that it is seen as a one time project when in fact maintaining data quality is never-ending. Even if you cleanse your data once, like in our example the physical addresses, this data will decay just by sitting there. Why? People move, zip and postal codes can change, specific addresses can cease to exist. Data quality needs to always be monitored. Plus it’s never just about the one project. The quality of a data set can have multiple ramifications and they can affect your business in more ways than you think. So remember, that data quality should not be project based, but program based.

Read about the 3 types of data quality projects a data steward should work on

Myth #3: It’s IT’s responsibility

This is the one I encounter more often. Data is technical so it must be IT’s responsibility to ensure its quality is high. Wrong! First of all, bad data affects every unit of an organization and the organization as a whole. Potential revenue as well as beneficial engagements and interactions with your constituent base can be lost because of bad data. Second of all, even though IT plays an important role in offering the technical solution for improving the quality of the data, it is always the business which needs to offer the definitions for every data quality dimensions: completeness, accuracy, timeliness, consistency, etc. In reality, data quality is EVERYONE’S RESPONSIBILITY. Even though it takes a long time to change people’s perception, this is something that constantly needs to be communicated in your presentations, status reports, on-boarding, and through other communication vehicles. For best practices on communication, please read the “3 communication steps for successful data management programs“.

Myth #4: A good tool will ensure its success

This is a misconception not only applied to data quality, but many other pain-points an organization is trying to solve. Good tools are important and needed, but it’s the people defining the scope, the issues which need to be resolved, it’s the people analyzing the causes of bad data, and it’s the people creating the data quality and business rules for data cleansing, data integration and overall data quality management, as well as assigning roles and responsibilities for the ongoing maintenance of data quality. People and their skills are arguably the most important cog in the data quality improvement machine.

What myths have you heard of which you thought they were true?

Other frequently held beliefs and myths (not “which [I] thought they were true”):
. “Legacy data is not great”
. “New/incoming data is better than old/legacy”
. “Data quality is what it is and will not worsen”
and of course:
. “We are unlikely to need old/legacy data in the future”.

George F says:
at
Very true. I’m guessing you had to deal with legacy data quite a bit? In your situation, were the business stakeholders ever convinced on the importance of legacy data?
Reply
- Martin Storey says:
  at
  George,
  I am a geoscientist (in the oil and gas industry), so I spend much of my life attempting to extract value from (existing data) – typically field- or lab-acquired data that becomes “legacy” almost immediately after acquisition. In this context, “legacy data” refers to data liable to decay and to lose fitness-for-purposes (immediate and future, planned and unforeseen). I also spend quite a bit of time planning the acquisition of new data, and I am acutely aware of the pressures to do things faster and cheaper, regardless of value, to the extent that we frequently spend time and money acquiring such half-arsed data sets that they will be of no value.
  Data analysts know intimately about the importance of legacy data, although younger people have also been led to believe that technology can make up for any data issue (when in fact technology can make up and cover any data issue, but that doesn’t make it right or confident, only misleading). However, they tend to have little voice and use it even less.
  The problem is that decision makers, frequently their +i managers, frequently do not understand about data and are relentlessly bombarded by software manufacturers’ spin. There’s no sexiness in legacy data, when compared to new data… not least because legacy data is imperfect, whereas managers can decree that from now on, all data will be good.
  A presentation I gave some years ago to the data management community was entitled “Things are NOT getting better”. I wish they were and advocate for positive changes.
  Reply
  - George F says:
    at
    Thanks for sharing these details. Would love to see your presentation if you can share it on this site.
    Legacy data is definitely important, especially to derive meaningful analytics. That’s way, one of the things that I’ve seen to work is a side by side comparison on trends or other analytics derived from past data and how that is with or without including legacy data as part of the data set.
    Thank you again for sharing this with us.
    Reply

Share0

Tweet0

About the author

George Firican

George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
tve_leads_unique	1 month	This cookie is set by the provider Thrive Themes. This cookie is used to know which optin form the visitor has filled out when subscribing a newsletter.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_1Z635JPV9L	2 years	This cookie is installed by Google Analytics.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AE_AB_COOKIE	1 year	No description
DEVICE_INFO	5 months 27 days	No description
loglevel	never	No description available.
tl_4829_4830_26	1 month	No description
tl_4829_4840_30	1 month	No description
tl_4829_4941_41	1 month	No description
tve_secret	1 year	No description available.

4 myths about Data Quality everyone thinks are true

Myth #1: It’s all about fixing the data

Myth #2: It’s a one time project

Read about the 3 types of data quality projects a data steward should work on

Myth #3: It’s IT’s responsibility

Myth #4: A good tool will ensure its success

George Firican

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling

Optimizing retail operations through a practical data strategy

Transforming Marketing Data into Business Growth: Key Insights and Strategies

You may also like:

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling