Reference data needs a standard definition – here is one

When reference data comes up in discussions with other data management or data governance practitioners, we sometimes realize we mean different things or we only overlap in our definitions 80% of the time. Within a data governance conversation, this seems a bit ironic. So what is reference data, in the context of data management (let’s not worry about programming languages at this point)? I’ve been looking for that one standardized definition and of course, I could not find one – again, in the context of data governance, I see this as another irony.

One thing that’s agreed upon is that managing reference data is important. It is important, because:

it’s estimated that anywhere between 20-50% of the tables in a database house reference data
the data quality issues of reference data will have a cascading effect in data analysis, reporting and data integration

Reference data characteristics:

It is not created or it does not change as often as master data – Once you’ve loaded your table with currency types, you wouldn’t have to update it often. For example, the “new” Euro currency became into effect on January 1st, 1999 and redemption after legal tender of the currency it replaced is considered indefinitely, in some cases, and in some cases they have an official date.
Shared by multiple systems within or outside the enterprise – For example, the list of countries, sex and gender codes, types of diseases, units of measurement, etc.
It does not describe things that the enterprise does business with, but rather it categorizes the data which describes the enterprise’s transactions and master data – Such as the type of products, status of the orders, location of the customers, etc.
Each piece of data has a distinct definition – Ex: the type of an organization could be a corporation, foundation, government corporation, non-profit organization, and so on, each with its specific definition
Often defined by 3^rd party bodies – Ex: ISO, UN, WHO, etc.

Besides being defined by the 3rd party bodies, which are either business domain specific and/or world-wide such as World Health Organization, they can also be organization specific. Therefore, reference data can be split into 3 categories:

Universal reference data
Industry reference data
Internal reference data

Here are some reference data examples across these categories:

reference data examples

So what is reference data?

A set of permissible values associated with a distinct definition, used within a system or shared between multiple systems in an organization, domain or industry, which provides a standardized semantic to further categorize a data record.

Reference data is your status codes, product codes, flags and attributes, lookup tables, categories, and so on. From an end user perspective, it’s usually what you find in drop down fields.

Once your reference data is understood, the conclusion is simple: organizations need to invest in a reference data management program to create operational efficiencies and aide the development of valuable information for all levels of the organization.

How do you define reference data? Please feel free to contribute with yours or improve the one above. Meanwhile read through others I found.

Other definitions

IBM:
Reference data refers to data that is used to categorize other data within enterprise applications and databases. – link to definition
Simplicable:

Reference data is data that is used to structure and constrain other data. It is typically stable information with a known set of values that rarely change. – link to definition

Danette McGilvray & Gwen Thomas:

Reference data are sets of values or classification schemas that are referred to by systems, applications, data stores, processes, and reports, as well as by transactional and master records. – link to definition

Share0

Tweet0

About the author

George Firican

George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

4 skill sets needed for a successful data steward

George Firican

01/08/2020

The pros and cons of the 4th industrial revolution

George Firican

04/24/2019

7 principles of data quality management

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
tve_leads_unique	1 month	This cookie is set by the provider Thrive Themes. This cookie is used to know which optin form the visitor has filled out when subscribing a newsletter.

Performance

Analytics

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_1Z635JPV9L	2 years	This cookie is installed by Google Analytics.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Others

Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.

Cookie	Duration	Description
AE_AB_COOKIE	1 year	No description
DEVICE_INFO	5 months 27 days	No description
loglevel	never	No description available.
tl_4829_4830_26	1 month	No description
tl_4829_4840_30	1 month	No description
tl_4829_4941_41	1 month	No description
tve_secret	1 year	No description available.

Reference data needs a standard definition – here is one

Read more about the 5 best practices for managing reference data.

Reference data characteristics:

So what is reference data?

Other definitions

George Firican

Human in the Loop AI: Why It’s Often Just a Checkbox

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling

Optimizing retail operations through a practical data strategy

You may also like:

4 skill sets needed for a successful data steward

The pros and cons of the 4th industrial revolution

7 principles of data quality management