Similarly to how an organization’s human resources, equipment, and intellectual property are assets, so is data. Managing its quality is essential in order for the organization to be able to rely on it and benefit from it, yet many data quality initiatives tend to fail when these common mistakes are ignored.
These are the 7 mistakes to avoid when building a data quality program:
1. Not treating it as a program
Many organizations still treat this as a project, as a one time fix to an immediate issue. A lot of resources are wasted for a one time data cleanse which only benefit a specific segment of a project’s deliverables (such as a system integration, report development, customer analysis, data migration, and so on). The problem is that even though it solves this immediate need, it’s not something sustainable. It does not follow the trifecta of best data quality management practices and as such, bad data creeps in, it might not also meet other purposes, and after some time it ends up in the same poor state it used to be in. Data quality cannot be achieved though a project, but through a program – where ongoing resources are invested and clear ownership and goals are defined and obtained.
2. Determining data quality issues solely on anecdotal information
Part of a data quality analysis phase, interviews are performed with various business stakeholders in order to better understand the root of data quality issues, develop those data standards and have a better understanding of the business and technical environment. There’s no secret that a lot of employees get affected by bad data and some will be quite vocal about it. Some stakeholders can provide a lot of anecdotal information and details of how their daily tasks get affected by it and what their workarounds are. I’ve seen data quality assessments with hundreds of transcripts of these type of discussions and anecdotes, but all these don’t give you an accurate picture of what the biggest data quality challenges are, nor what their cause or impacts look like. Data itself needs to be profiled and measured against business or industry standards, but also correlated to the business needs and drivers. A sole individual’s bad data quality pains, might also not be as important as another individual’s from the point of view of the organization and its needs. Don’t let that anecdotal information fool you into thinking you can pin point what the focus of your data quality program should be. The business need is key.
3. Having IT as the owner
Every program requires ownership in order to be successful. For a data quality program, IT usually takes the reigns or gets assigned as its owner. I see this very often and it immediately raises a red flag to me on the future of the program. Why is this an issue? While I agree that data can be highly technical and has a major technical component, it also gets created, acquired, maintained, and used based on business needs. IT is not the one responsible for defining those business needs, rules and standards which data should comply to. It is really the business side which outlines what the data quality requirements are. Of course, the business creates a much needed partnership with IT to then get this implemented.
4. Not standardizing master reference data from the get go
It’s estimated that anywhere between 20-50% of the tables in a database house reference data. Master reference data is found under multiple systems used by an organization and when this is not managed, its data quality has a compounding effects in data analysis, reporting and data integration. This should be the first type of data to bring under your data quality program’s umbrella. For example, even if your mandate is to cleanse customer’s addresses, first focus should be on the reference tables for countries, states/provinces, and address attributes such as address type and usage criteria. It’s low hanging fruit with a lot of benefits.
Read more about the 5 best practices for managing reference data.
5. Not becoming proactive
It’s very normal to be reactive when you start a data quality program. This is an expected step of any data management maturity model. Though, you cannot afford to remain in a reactive mode for a long time. Your data quality program needs to create the necessary data quality control methods to prevent bad data being created or imported into your ecosystem and more importantly to try to fix the root cause of bad data.
6. Investing in software first
Using software is important and an efficient way of tackling data quality. After all, you wouldn’t want to manually go through every record to cleanse it or manually profile and analyze terabytes of data. That being said, this should not be your first investment. Any software needs a lot of configuration and rules to be defined by which to help you analyze and cleanse your data. Buying it before your business rules, standards, processes and procedures are implemented will only waste your money as the software will wait for these inputs and not be able to be used. A software is not the solution, but merely part of it.
7. Not communicating enough
I believe a lot in the power and benefits of communication. I think it can make or break any program or initiative and for a data quality program this is no different. The most prevalent communication pattern I see is the following:
- Communicate to the organization that a data quality program is underway and what it looks like
- Communicate with specific stakeholders or departments when specific projects need their involvement
- Communicate the results (maybe)
This is not enough. Leading a good data quality program needs a lot more communication. You need to:
- Communicate what the program will do and why this is important
- Communicate what the program is doing and how this impacts stakeholders – tie this to the business goals
- Communicate what the program has done while celebrating the results and the stakeholders which took part in it
Also ensure you have multiple mediums of delivering these communications. There will be stakeholders who like to be actively involved in being kept up to date (by participating in presentations, taking part of status report meetings, receiving newsletters, etc.) or others who prefer a more passive approach (by referring to a website, bulletin board, digital signage, twitter feed, etc.)
Bottom line, communication is important.
Conclusion
Out of the disciplines needed for data management, data quality management is often viewed as an organizational necessity. In order to be successful, the mistakes listed above need to be avoided. Please feel free to share what your lessons learned are and what are some other mistakes we should all learn from.