We often hear of data quality and data governance belonging together. That one cannot have good data quality without data governance. And that by doing data governance we achieve data quality. How so? What does that mean?
What is the relationship between data governance and data quality? Or are they the same thing? Let's find out.
Data quality management
Let's look at data quality first. Data Quality or Data Quality Management to be more exact is focused on ensuring the data adheres to our data quality dimensions. In other words, that data is:
- Complete
- Valid
- Accurate
- Timely
- Consistent
- Etc.
Data quality has a few dimensions and I will cover that in a separate article. But to clarify, data quality management ensures that our data adheres to these dimensions.
As Dr. Peter Aiken puts it, data quality ensures our data is "fit for purpose". Or To put it simply, data quality management ensures that we have data of good quality, data that is clean.
Let's expand our understanding of data quality management and look at it from the point of view of DAMA international. Data quality is one of the 11 data management domains identified by the Data Management Association International:
DAMA-DMBOK2 Data Management Framework
Data modeling & design
Data storage & operations
Data security
Data integration & interoperability
Document & content management
Reference & master data
Data warehousing & business intelligence
Metadata
Data quality
Data governance
According to DAMA, Data Quality Management consists in “the planning, implementation and control of the activities that apply quality management techniques to data, in order to assure it is fit for consumption and meets the needs of data consumers.”
Data Quality actually has a role in most of the other data management domains and the other way around. Think about it... you can't have good data quality if you want to ensure data security, metadata's relationship with data quality is also a two-way street, data architecture will also play a role in the quality of data, but also the other way around, and so on. We can go around the DAMA wheel and find a data quality influence in each one of those areas. So you need an enabler, a connector to ensure all these dm practices come together. That connector is Data Governance.
Data quality with or without data governance
I think this comes to no surprise that there are different definitions on what data governance is. Check out my other article on data governance to go over that or the one about data governance and data management to get more details and a better understanding on what each one of these is. But if you'd like to take a shortcut, data governance is “the discipline which provides the necessary policies, processes, standards, roles and responsibilities needed to ensure that data is managed as an asset.”
By now you might say, "Ok, I kind of get it, but we have data quality without having data governance." If that's the case I think there's 2 possible realities:
- There is no data governance
- Data governance exists, but it is undercover
Let's look at these two cases in more detail.
1. There is no data governance
If data governance is inexistent, then we'll probably encounter one, some, or all of these cases:
1. Data quality is not enterprise-wide: the data quality initiative/ program is most likely note enterprise wide. Even if it's focused around an enterprise system such as an ERP or a CRM, that does not mean data quality is enterprise-wide. Sure, it might have a wide reach across the enterprise, but it is localized to a system and that's an issue. It is an issue especially in larger organizations because there might be data quality rules in place that were created only with the input of the stakeholders of this enterprise system, but then this could affect those that are not stakeholders or users of the system.
2. Data quality efforts are localized: if data quality is not localized to a system, it is probably localized to a particular department or departments. And that can create other issues which I won't go into details right now, but some of them are outlined in the following points.
3. There's a lack of common standards: data quality standards might be created, but only as they pertain to the needs of a particular department, business unit, or system. As soon as other departments or systems get onboarded into the efforts of the data quality program, there will be conflicts in terms of these standards which will require workarounds or complete changes.
4. There are no clear roles and responsibilities: there might be assigned resources which take responsibility for cleansing data and for maintaining it, but the type of resources differ from one team to another. There might also be unclear of who owns what and who has ownership when conflicts in standards, definitions, and priorities happen.
5. Data quality management is mostly reactive: this is mostly the sign of data quality program in its early stages, but also one without data governance. Data quality issues are identified and dealt with in a reactive way, not always tacking the root cause of the problem.
2. Data governance exists, but it is undercover
You might actually say that "well, we're not in such a bad place as you've just described it above". You might actually have:
- A data quality policy
- Data quality standards
- Data quality metrics & KPIs
- Defined roles and responsibilities
- Defined processes, procedures, etc.
Then you probably have what I like to call "undercover data governance". You probably have a lot of the pieces of data governance, but without that defined data governance organizational framework, without formalizing data governance. You have more data governance elements than you think you do, which is good, it's a good place to be in as you can use this momentum and work already done to start formalizing data governance.
Data quality and data governance relationship
I think I've already made it clear from the previous section that data governance and data quality exist in a symbiotic relationship. They are two sides of the same coin. You can't have good data quality without data governance, and a data governance implementation must be really ineffective to not address data quality.
There is actually quite a bit of an overlap between data quality and data governance, such as in:
Data rules
Data standards
Data auditing
Data validation
Ongoing evaluation
Data enhancement
Data quality dimensions
Metrics & KPIs
Reporting
Prioritization
Ongoing improvement
Processes & procedures
Communication & change management
Data governance describes who needs to do what, to what data, under what conditions, and what processes, procedures, tools, and overall best practices to use. So a lot will beneficially impact data quality, but not only. The standards, metrics, roles and responsibilities, data rules and so on will benefit data quality, hence the overlap, but not just data quality. There's a direct benefit to master data management, data accessibility, data integration, metadata management, business intelligence, even data security, and so forth and so on.
Of course, there are also areas only pertaining to data quality such as: data profiling, data matching, root cause analysis, and data cleansing. As there are on the data governance side: data accessibility, data compliance, data policies, roles & responsibilities.
Conclusion
Many times data quality is one of the drivers of data governance and that's the initial focus of a data governance program hence maybe the confusion between the two. But again they are not the same. They are two sides of the same coin and you can't have one without the other.