I've seen and heard many people from different organizations talking about data governance, discussing what it is, and emphasizing its importance. Despite this, there seems to be some confusion about what data governance truly entails. Maybe that's because it's not as widely recognized as I would hope, or maybe it's overshadowed by disciplines such as data science, AI, or machine learning. Whether people work directly with data governance or have merely heard of it, the reality is that it impacts everyone, either directly or through its absence. Yet, despite its widespread impact, not everyone has a clear understanding of what data governance is.
Data Governance in the news
Data governance doesn't make it in the news too often, but when it does, it makes it with a bang. The news that stood out to me the most was published back in October 2020. It was about how the Office of the Comptroller of the Currency (OCC) fined Citibank $400 million over persistent issues in risk management, data governance and internal controls.
Why did it stand out? Well, mainly because I'm often asked about the importance of data governance, and there are a lot of reasons to invest in data governance and many more benefits that it brings, but this article put a large price tag of what it could cost for not having it. $400 million can get you a lot of data governance :).
Of course, there are many more examples of data governance being mentioned in the news. There are plenty of mentions of gigantic GDPR fines (ex: €746 million fine to Amazon), Twitter's $150M penalty over privacy of users' data, Google's numerous fines, and so on. Take your pick.
There are a lot of examples out there on the fines and reputational damage that an organization can incur for a lack of data governance, but maybe this is also where some of the confusion comes from as to what data governance is. Because of these news, often data governance gets equated to data security and privacy, or regulatory compliance. No, data governance is not the same as data security or data privacy. Yes, data governance can enable regulatory compliance, but that's only one of its facets. So let's see and understand what data governance is.
What is Data Governance?
There are plenty of definitions out, but before presenting you others, I would like to walk you through mine:
Data Governance is a discipline which provides the necessary policies, processes, standards, roles and responsibilities needed to ensure that data is managed as an asset.
So what does that mean exactly? It means that if you need to improve data quality, ensure information security, enable master data management, etc. you need to have a solid foundation tying all these practices together and defining and enabling the processes, tools, and resources needed to make these practices successful. Even simpler? Data governance provides the necessary guidance to manage your data as an asset.
Understanding Data Governance
Who heard of the expression “Data is the new oil?”. It was coined in 2006 by Clive Humby. He was a mathematician and the architect of the Tesco Clubcard (a supermarket rewards program). And he said this because as they were gathering this data from their customers via the club card and analyzing it, when he saw the insights they were discovering, he realized that it was as if he struck oil.
Editor's note: I think that there are many more differences between data and oil than there are similarities, so even though this might not be the best comparison, there is one message that stands out to me out of this and that is that "data is important".
The message is this: "Data is important! Data is valuable! Data is an asset!" In order for this data to be treated as the asset that it is and get the most out of its value, a few things need to be ensured and that is that this data is:
- Accessible to the right people and systems
- Defined and understood
Therefore we need:
- POLICIES: to ensure that we get clean data, documented metadata, categorized data, classified data, and so on
- PROCESSES: to establish well understood steps to clean this data, to ensure its consistency, to define it, to provide access to it, to secure it, etc.
- STANDARDS: to ensure consistency in our cleanliness, metadata definitions, etc.
- ROLES & RESPONSIBILITIES: to define and assign who will be the one(s) creating all of the above policies, processes, and standards, as well as who will approve them, maintain them, enforce them, etc.
And this is again what data governance is:
"A discipline which provides the necessary policies, processes, standards, roles and responsibilities needed to ensure that data is managed as an asset."
If there's one thing that I would like you to remain from this is the following phrase:
Data governance defines who can take what action, upon what data, in what situations, using which methods while following established policies, standards, processes, roles and definitions.
Other names for Data Governance
Data governance sometimes gets a bad reputation because of the word "governance". Governance can inspire to some a very judicial, bureaucratic, controlling, and overall restrictive system so certain organizations shy away from naming it "data governance". That is why I've seen it branded as
- "Data Enablement" or
- "Data Excellence"
I personally don't mind calling it data governance as I think concepts should be called as they are, but whatever alternative name you choose for it, that's fine if it is still data governance under the hood.
Alternative definitions for Data Governance
- Data Governance Institute: Data Governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.
- DAMA: The exercise of authority, control, and shared decision-making (planning, monitoring, and enforcement) over the management of data assets.
- SAP: The practice of organizing and implementing policies, procedures and standards for the effective use of an organization’s structured/unstructured information assets.
- TDAN: The exercise and enforcement of decision-making authority over the management of data assets and the performance of data functions.
- Experian: Data governance is a process to ensure data meets precise standards and business rules as it is entered into a system. Data governance enables businesses to exert control over the management of data assets. This process encompasses the people, process, and technology that is required to ensure that data is fit for its intended purpose.
- Informatica: Data governance encompasses the strategies and technologies used to make sure business data stays in compliance with regulations and corporate policies.
- IBM: Data governance is a quality control discipline for adding new rigor and discipline to the process of managing, using, improving and protecting organizational information.
- DGPO 2014 Board Members (George Firican, Davida Berger, Michele Koch, Sal Passariello, Erin Kieffner): A discipline that provides clear-cut policies; procedures; standards; roles; responsibilities; and accountabilities to ensure that data is well-managed as an enterprise resource.
- IQ International (Joh Ladlay, Danette McGilvray, Anne-Marie Smith and Gwen Thomas): The organization and implementation of policies, procedures, structure, roles, and responsibilities that outline and enforce rules of engagement, decision rights, and accountabilities of the effective management of information assets.
There are indeed a lot more Data Governance definitions which I haven’t posted, most of these being self-serving from other organizations with solutions geared towards data management practices and trying to adapt their own definition towards one of their products. In the end, having so many definitions out there creates confusion, but it also leaves room for your own organization to adopt which ever one fits your needs, purpose, and overall culture.
Do you have your own Data Governance definition?
What is not Data Governance
Data governance should not be confused with Data Quality, Data Security, or Data Privacy. Those are different knowledge areas (as DAMA puts it), disciplines, or functions that are part of Data Management. There is a certain overlap between these data management areas and even others like metadata management, master data management, etc., but they are not the same thing.
Data Governance and Data Management
If you'd like to learn more about this, I encourage you to read my article or watch my video on "What is the difference between data management and data governance?".
Data Governance and Data Quality
I would argue that you can't have good data quality management without data governance in place, or look on the other side of the coin and realize that you have to try really hard for your data quality to remain poor if data governance is in place. To learn more about data quality management and data governance, read this article on data quality and data governance.
Data Governance and Data Privacy
Data Governance and Data Privacy are closely related, as both are concerned with protecting sensitive data and ensuring compliance with legal and regulatory requirements. Organizations should ensure that their Data Governance processes comply with the relevant data privacy laws and regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). Data Governance processes should include procedures for data minimization, data retention, and data subject rights management.
Why is Data Governance important?
Data Governance is essential for organizations to effectively manage their data assets and ensure compliance, security, reliability, and in the end data that is utilized and understood in the best way to support the business goals.
It helps to ensure compliance with legal and regulatory requirements, improves data quality and consistency, increases efficiency by reducing data duplication and errors, enhances decision-making by providing reliable and accurate data, and protects sensitive information and maintains data security. By implementing a comprehensive data governance program, organizations can improve data quality, enhance data privacy and security, increase efficiency, and support better decision-making. It is also important for organizations to ensure that their Data Governance processes comply with the relevant data privacy laws and regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) to protect sensitive data.
At a first view, you can see that data governance helps to:
- Ensure compliance with legal and regulatory requirements
- Improve data quality and consistency
- Increase efficiency by reducing data duplication and errors and deliver the necessary metadata
- Enhance decision-making by providing reliable, accurate, and understood data
- Protect sensitive information and maintains data security
But there are many more benefits that data governance brings that they can be categorized under:
- Data governance program
- Data operations
- Business operations
- Organization strategy
If want to learn more about these benefits, how to build a business case for data governance and ultimately implement a program or improve the one you have, check out the Practical Data Governance: Implementation online course.
Key components of Data Governance
Data Governance council
A data governance council is a governing body that is responsible for the
- Strategic guidance of the data governance program
- Prioritization for the data governance projects and initiatives
- Approval of organization-wide data policies and standards
- Enabling of ongoing support, understanding and awareness of the data governance program.
The council should be composed of high-level representatives from different departments and functions, such as IT, Legal, Compliance, and the Business units. The council should meet regularly to review the priorities of the program and remove any roadblocks along the way.
Data Stewardship operationalizes data governance. It is the day-to-day management of specific data assets, including metadata management, data quality, security, and compliance. Data Stewards are responsible for ensuring that the data is accurate, complete, and consistent, and that it meets the needs of the business. They also ensure that the data is protected and that any data breaches or data quality issues are reported and resolved. Data Stewards are usually appointed from the business units and they work closely with the Data Governance Council.
Data Governance metrics
To measure the effectiveness of the Data Governance processes, organizations should establish Data Governance Metrics. These metrics can include the number of data breaches, the time taken to resolve data quality issues, the number of data policies violated, and the percentage of data that meets the quality standards, and so on. The metrics can be split into 2 main categories:
- Progress metrics
- Impact metrics
By monitoring these metrics, organizations can identify areas for improvement in order to take action to improve the Data Governance deliverables, but also socialize them and showcase their impact. If want to learn more about these metrics and their subcategories, check out the Practical Data Governance: Implementation online course.
Data Governance Tools
To support the Data Governance processes, organizations can use a variety of Data Governance tools, such as Data Governance Software, Data Quality Tools, and Data Security Tools. Data Governance Software can automate many of the Data Governance processes, such as data lineage tracking, data quality monitoring, and data policy enforcement, but also help maintain business semantics (i.e. Business Glossary, which is a main deliverable for Data Governance) and the relationship to data lineage, data element, data quality, etc. Data Quality Tools can help to identify and resolve data quality issues, such as data duplication and data inconsistencies. Data Security Tools can help to protect sensitive data and detect any data breaches.
Data policies, standards, processes and workflows
These are some of the core components of a data governance program:
- Data policies, such as data access policy, data integrity policy, data integration policy
- Data standards, such as address standard, name standard, clinical terms standard, etc.
- Data processes and workflows, such as data quality issue resolution process, data policy maintenance, technical and business metadata management, etc.
Data Governance implementation steps
Implementing data governance is not an easy feat, but it is ever rewarding. It's the gift that keeps on giving. It can be split into pre-implementation steps, implementation steps, and ongoing steps.
Before you take start investing heavily in a data governance program, you need to assess where your organization is in its preparation and desire for data governance. As such you need to:
- Understand your organization's why for data governance
- Assess your organization's challenges and maturity level
Only then you have what you need to get buy-in from your program's sponsor(s), leadership, and key stakeholders. To do that you will create a business case and then select and secure a sponsor.
The implementation steps include the development of a scope document, the guiding principles for data governance, establishment of the early stages of your data domain model, data governance organizational framework, as well as the identification and assignment of data stewards. Once this foundation is in place you can:
- Develop metrics and KPIs
- Develop and deploy data policies
- Develop and deploy data standards
- Develop and deploy data process/ workflows
- Select and deploy tools
Introducing data governance in the organization means introducing change and that comes with a plethora of challenges that will risk for the data governance program to not be adopted. Therefore, as ongoing steps you should invest in continuous:
- Knowledge and training
- Rewards and acknowledgement
Do you want to learn more?
Practical Data Governance: Implementation - online course
Learn how to implement a data governance program from scratch or improve the one you have.
Data governance best practices
- Treat is as a program and a business discipline: data governance is not a project, it does not have an end date and requires ongoing investment, support, and exposure.
- Establish clear roles and responsibilities: Clearly define the roles and responsibilities of the data governance council, data stewards, and other stakeholders.
- Communicate often: Communicate the data governance plan and procedures to all relevant parties and moreover communicate what was, is, and will be done and tie it back to the why
- Regularly review and update the governance framework: Regularly review and update the governance framework to ensure it remains effective and relevant.
- Monitor compliance: Monitor compliance with the data governance plan and take appropriate action when violations occur.
Involve stakeholders: Involve stakeholders in the data governance process to ensure buy-in and participation.
- Ensure data quality: Implement processes to ensure data quality and consistency, such as data validation and data profiling.
- Invest in a common glossary from the start: Establish a business glossary with shared and approved business term and data definitions with a clear curatorship and ownership process
Data Governance is essential for organizations to effectively manage their data as an asset and ensure compliance, security, reliability, and a data-informed run organization. By implementing a comprehensive data governance program, organizations can improve data quality, increase efficiency, and support better decision-making, among some of the main benefits that it brings. Regardless of the definition of data governance, it is something that all organizations aiming to be data-driven or data-informed need to invest in.