Managing reference data poorly can have a profound impact on your operations as well as business intelligence and analytical outcomes. Here are my 5 best practices for managing reference data:
1. Formalize reference data management (RDM)
Most often than not, reference data is not maintained if there is no accountability and ownership determined. Usually the IT team performs the initial load into the application or central repository (in cases such as a one time data quality effort, adoption of a new application, or a data integration project), but then they do not worry about keeping it up to date. Business users lack the time, resources, and even understanding of tracking the changes required and keeping it up to date. At best, this is done at the system level or report level and not enterprise wide.
This can be addressed in multiple ways. The best case scenario is to create an Enterprise Reference Data Unit (also called reference data working group, reference data owners, reference data stewards, reference data committee/ council, etc.). This central group oversees reference data management across the enterprise, supporting the business needs accordingly. Another way is to ensure that reference data management is in scope of data governance or that the RDM responsibilities are included in data stewards’ job descriptions.
If this is not part of your data governance scope already, make sure its management is approved as a program and not a one-time project.
2. Subscribe to external reference data
As a default, it’s recommended to first consult reference data provided by 3rd party standard authorities, such as ISO, SWIFT, ACORD, ICD, and so on. Defaulting to these saves a lot of time internally, excluding the resources required to discover, understand, adopt and sometimes purchase this data, and you can be confident you’re adopting the standards from a reliable source.
Please refer to the “The single best strategy for improving your mailing addresses” article to download a free list for all ISO 3166 country names and codes.
Adopting this data does not ensure a subscription to any updates so you need to ensure any changes done by the 3rd party authority are quickly determined and integrated into your master tables. For example, country codes change on an average of 3 times per year. Sometimes a paid subscription is available which would allow instant notification or integration with your technical environment via API. Please keep in mind that the ramifications for updating this data don’t just stop at the table level, but carry on within documentation and training, metadata and definitions, as well as anything hard coded, such as forms, reports and dashboards.
3. Govern internal reference data
Internal reference data are completely specific to your enterprise. In contrast to the external reference data, the team responsible for the internal reference data needs to focus on the development of good reference data. Managing this data needs a defined process, guidelines, and ownership. Reference data management needs to be formalized – see 1st best practice- and the assigned individuals responsible for this data need to work with the:
- data governance team in ensuring the development of standards and guidelines, ownership identification, and following the operational model
- technical team (IT) in order to be aware of any technical considerations and ensure the IT delivery aligns with the reference data standards
- executives for acquiring support and resources
- business stakeholders to acquire definitions, use cases, information and feedback, as well as adoption
4. Manage reference data at the enterprise level
Reference data management needs to support multiple domains in order to avoid reference data silos. This outcome is dependent on the data governance operating model you adopt. As this becomes widely used across different enterprise systems, distribution of updates and new entries needs to be addressed. A data bus, or a data hub, is a common solution as it provides a central location from which any application can import its reference data from. This import can happen automatically by pulling the data through a API subscription type, or an automatic read/write function, or manually by a batch or static file. If the import occurs manually due to technical limitations, communication of any updates occurring at the data bus level should be passed on to the data stewards within the impacted business domains and to the data custodians of the impacted applications.
5. Version your reference data
As reference data has a wide prevalence across your systems, you not only need to ensure it is up to date across the enterprise, but that you can keep track of its changes. This is particularly useful in data integration projects, addressing master data management needs, and business intelligence deliverables. One needs to know the effective date of the change and what was it changed from in order to address all of these.
Have you considered managing your reference data? What best practices did you adopt or create?
Thank you for the refresher course. This was simple and to the point and effective. The only thing that I would add or perhaps provide as an update, are the current tools available to support this concept. Again, thanks.
Thanks for presenting great insights on RDM in simple way 😊
My pleasure, Santhosh