The trifecta of the best data quality management
Data is an enterprise asset, enabling the organization to make informed decisions. In order to secure data’s position as a highly valuable asset, the organization needs to implement a data quality management program. This will achieve and sustain a high level of quality for the data the enterprise is generating and consuming.
The following is a high level overview of the three phases you need achievements in, for a sustainable data quality management. These are, analyze and identify, fix and prevent, and communicate. Once you you’ve identified the scope of the data you need to manage, you need to take them through each of the following steps:
1. Analyze and Identify
Any sustainable program needs to encompass:
- Analyze your environment: start with understanding your information environment, including the various systems and their dependencies, but also make sure you have an understanding of the business and cultural one. These will provide you with a more complete view of a situational analysis and the right leads for determining the root cause of bad data.
- Identify the standards: you can’t measure if you don’t know what to measure against. Identify what the industry standards are for the type of data you’re managing and adopt them verbatim or adapt them to match your enterprise’s needs.
- Analyze your data quality level against the standards: simply measure. Data profiling is usually done at this step at column, database or cross-database level.
- Analyze the business impact: some argue this needs to be done before measuring your data quality level, but in my opinion it needs to be a mix. You need to at least do an initial measurement so you can better determine the impact your data quality would have on the business if left unresolved. It can be tricky to determine the cost of missed opportunities, but you can determine cost savings for things like reducing product defects or increasing efficiencies in data flow processes. Here’s a guide on how to determine the cost of bad data quality.
- Identify the needed resources: what skills, tools, number of people and the amount of time needed to tackle the issues.
- Identify data owners, stewards, and custodians: an important step a data governance program will support and ensure assignment of these responsibilities is also done. (See the following article if you want to better understand the different types of a data steward.)
- Identify the root cause of bad data: there are several approaches for root cause analysis (RCA) one could take here: safety-based, production-based, assembly-based, process-based, failure-based, and systems-based, or a combination of them. You can apply different techniques, such as the fishbone diagram exercise or the 5 whys.
2. Fix and prevent
- Fix the data: usually a step one should aim for only doing it once. Fixing the data to ensure it meets the standards you’ve put in place in the previous phase can be either done in-house, or through an external service against an external data set (ex: address cleansing) or internal rules (ex: any reference data)
- Prevent bad data from reoccurring: a key step for ensuring the sustainability of your data quality program and keeping your data quality in check. This can be a hefty step as it would include redefining the business process, adding data validation rules at point of entry, adding or rewriting your ETL(s), and have all the needed training materials and sessions in place to prevent bad data from reappearing. Last, but not least, certain data quality audits also need to be established to monitor your data quality status and identify those exceptions when need be. Luckily, most of the scripts created for analyzing the data quality level can be re-purposed and transformed into an audit report.
This last step is not sequential to the previous two, but something that needs to be done in parallel. Be mindful of the following considerations to help develop specific communications around data management:
- Audience: who is responsible, accountable, who needs to be consulted and who needs to be informed about specific steps of your data quality management, its scope and outcome? A RACI matrix is recommended
- Message: what needs to be communicated to each of the above audience types? How is each one impacted by the changes, how do they need to be involved and why should they care?
- Timing and communication tools: what is your communication timeline and when would each audience need to be engaged? Always remember to communicate what action will be taken, what the expected outcome is and then always follow-up with what action was taken and what the actual outcome was. Details are better, but be audience specific. Remember to tie these outcomes to the business goals and remind the stakeholders why data quality is important. Here are some avenues of communication you can consider taking:
- Organizational e-newsletters
- Lunch and learns and skill sessions
- Intranet and RSS feeds
- Traditional media prints
- Team meetings
- Project based communication
download the free communication plan template, if you need one
Have you implemented any of these so far? What else do you think should be included?