I love metadata because of the benefits it offers. In fact, I never “metadata” \met a data\ I did not find useful (geeky joke, I know). If you don’t have a metadata management program or initiative in place, there are a lot of reasons why you should consider investing into it. In fact, the following 4 main roles of metadata should provide you with a glimpse into the benefits of managing it:
Data can have a lot of characteristics by which it can be grouped or classified. Why do you care? Because having these categories will allow you to organize it and manage it. Data can be classified by each and any of the following:
- Subject – Ex: financial data, student data, fundraising data, health data, product data, etc.
- Usage – Ex: transactional, analytical, regulatory, etc.
- Time – Ex: live and current data, historical, predictive, etc.
- Content – Ex: geo-spatial data, machine data, structured vs unstructured data, etc.
- Scope – Ex: enterprise, external, departmental, master data, etc.
Managing data based on classification or groups allows you to apply the same standards, procedures and processes, as well as data stewards and owners. Though you can have the same data falling under multiple groups which adds another layer of complexity into how it should be managed. For example you can have the same metadata indicating it’s transactional, that falls under GDPR, it’s enterprise wide, as well as live and unstructured health data. Usually these groups can be placed within a hierarchy to determine which classification should take precedence over others.
If you want to know about the 3 classification groups to help with GDPR, please read our other article, too.
Describing the data helps you understand both of its logical and physical aspects. Described data should include:
- Data meaning – Business definitions, data modeling entities and attributes
- Data structure – Description of data objects (entities, tables, records, etc.), their logical groupings and relationships
- Data content – The types of data such as date, currency, text, number, etc.
- Data values – What values are allowed, what reference data is available, what patterns or value ranges should it follow, what constraints should it meet, etc.
- Data lineage – What is the data source, how was the data created, derived, and/or calculated, how was it transformed, etc.
Without this description you will be treading water in collecting data, integrating it with the internal or external systems, maintaining it, or deriving useful information out of it.
Metatada can serve as a guide to any technical or business user to find the data they need, through search engines, or other processes. This guidance metadata can be comprised of:
- Keywords – This could be any metadata described so far
- Taxonomies – Yet another example of how classification helps
- Date/ time stamps – Usually automatically added at the table or row level
- Associated reports, processes, people – Knowing where data surfaces, who the data users or data stewards are, how data is captured and transformed could serve as a good starting point for finding what you need
- Synonyms, aliases, related terms
Providing your stakeholders with the guidance to find the data they need for reporting, analyzing, testing, prototyping, troubleshooting, etc, saves time and makes better use of available resources.
Metadata can provide the necessary knowledge to figure out what controls should be enforced upon the data and what data should be controlled. It enforces constraints due to:
- Regulatory compliance & internal policies
- Retention & archival
- Privacy & security
- Service levels & business requirements
- Technical requirements
These controls help ensure compliance with internal and external rules and regulations, policies, and business and technical requirements.
These metadata items are not mutually exclusive. From the examples above you might have already identified how taxonomy helps with the classification role, as well as providing control and guidance. A single metadata item can serve multiple roles and it is this fact that increases its value.
Given all the things you can do with metadata, the one thing that confuses many practitioners and confounds many data management initiatives is captured in the last sentence of George Firican’s article:
“A single metadata item can serve multiple roles and it is this fact that increases its value.”
Take out the word ‘metadata’ and that statement is true of any data item. The problem, of course, is maintaining the integrity of the values when they are used – and reused – in many different places. The answer is to register individual values *once* and assign an identity, a semantic class, and declare equivalent forms in a *context independent* catalog. For example, “John O’Gorman” has an ID of 2a8f33b4147cc7900; a semantic class of Person; and declared equivalents of “O’Gorman, John” and “John D. O Gorman”.
Then, using the article’s four main roles as a guide, I can use that identifier as metadata anywhere.