In “The trifecta of the best data quality management” article, I’ve addressed why a data quality program is needed and what the recurring steps you should always go through are in order to carry on continuous and sustainable efforts to improve and/ or maintain the quality of your data. When it comes to working with the data itself, there are 3 main projects every data steward should tackle:
1. Data quality assessment project
Data Quality Management Phase: Analyze & Identify (Phase I)
Purpose: To quantitatively, qualitatively, objectively understand the state of data quality
Assumptions:
- A particular set of data has been chosen
- The business need and impact is understood
- Data ownership is assigned
- Project sponsorship is in place
Expected results and deliverables:
- A data quality scorecard with initial scores
- Data quality scripts
- True cost of data quality improvements and ROI calculations
High level components:
- Plan – Identify and secure necessary resources and create a project plan for a particular data set.
- Build DQ rules – Use a data quality tool or a query script (ex: SQL procedure) to convert business rules into data quality rules.
- Execute DQ rules – Run the tool or script and save the output.
- Test and improve – Work with the business and data experts to evaluate the results and see if the data quality scripts needs to be tweaked.
- Store DQ results – Create the scorecard and populate it with the snapshot of your data quality metrics. Recording the baseline of your data quality status is one of the most important steps people forget to take.
- Evaluate & improve – Evaluate your findings, the process itself, resources needed, the entire workflow and process, overall work and experience and see how it could be improved for future similar data quality assessment projects. At this step, you should also calculate the cost and benefits of cleansing this data.
- Communicate – In my best data quality management practices, communication is always a critical step to have. This is not sequential to all previous steps. Instead, it should occur prior, during, and after each step in order to increase visibility and gain stakeholder support for your data quality efforts. Don’t overlook its importance.
Stakeholders: Data owners, consumers, and data stewards
Implementation team: IT developers, subject matter experts (technical and business), data custodians
If you want to know about the different types of a data steward, please read our other article, too.
2. Data cleansing project
Data Quality Management Phase: Fix & Prevent (Phase II)
Purpose: To correct any data quality issues
Assumptions:
- A baseline measurement has been taken
- Data stewardship is assigned
- Project sponsorship is in place
Expected results and deliverables:
- Data cleansing process and data cleansing scripts
- Improved data quality
- Data change audit trail
High level components:
- Plan – Identify and secure necessary resources and create a project plan for cleansing this data. This will look different if you need to purchase any 3rd data services or sources to validate against.
- Prepare data – Sometimes, the decision is made to only cleanse a subset of the same data set (such as only your residential national addresses). The data might also need to be extracted out of your database to pass on to a 3rd party data service and this is the step this is tackled.
- Develop scripts – Build your in-house data cleansing scripts (usually repurposing the work done in the data assessment project), and/or creating the technical procedures for loading the cleansed data back into your database. Test and improve until you comply with your data quality rules.
- Cleanse – Run your scripts and cleanse the data.
- Handle exceptions – You should always expect encountering exceptions and not be able to cleanse all of your desired data. This step could trigger the start of other data quality projects, postpone this work and push these data in an exception table, or redevelop your data cleansing scripts to resolve them if possible.
- Communicate – Same as in any other project, make sure you communicate at every step. If you’re worried about communication overload you can just communicate at the beginning of this project of its intention and its outcome at the end.
Stakeholders: Data owners, consumers, and data stewards
Implementation team: IT developers, subject matter experts (technical and business), data custodians
3. Bad data prevention project
Data Quality Management Phase: Fix & Prevent (Phase II)
Purpose: To prevent or reduce any similar data quality issues reoccurring in the future
Assumptions:
- Data quality issues have been identified
- The scope of bad data prevention has been defined
- Data stewardship is assigned
- Project sponsorship is in place
Expected results and deliverables:
- Alignment between data quality and business needs
- Improved data quality
- Reduced cost of performing data cleansing
High level components:
- Plan – Identify and secure necessary resources and create a project plan
- Analyze causes – Your business and technical analysts need to help you on this step. What is the root cause that is creating the data quality issues, or is it more than one? Is it due to a lack of standards and definitions, misalignment between the business needs and the technical implementation, are there technical limitations?
- Determine Solutions – Depending on the root causes, you might need to create a new ETL process, change a piece in the data architecture, update an existing business process to reduce certain ambiguity which might be creating bad data, implement data quality controls at point of entry, change existing standards, or just communicate requirements better. This own step is a topic on its own.
- Implement Solutions – Develop and deploy whatever solution you’ve agreed on in the previous step. It’s usually more than one thing that needs to be done.
- Monitor Results – Even if you think you’ve done everything to prevent the same data quality issues from ever reoccurring, chances are that it will. Always create an audit for your data quality rules to flag any outlier so you can then analyze its cause and implement new solutions. Work on the audit piece should reuse the scripts developed in the assessment and/or data cleansing projects.
- Communicate – Keep reminding your stakeholders the importance of keeping your data clean. Always communicate and raise awareness.
Stakeholders: Data owners, data stewards, business leads, process owners
Implementation team: IT developers, subject matter experts (technical and business process), data custodians, business analysts
To reiterate, these 3 projects should be part of data quality management and data governance programs. Each project is influenced by the project management methodology you choose to follow, as well as available resources, scope, industry, and organizational culture.
If you are a data steward, you should be part of at least one of these projects. Hopefully this provides you with enough guidance on the steps each project should cover, what stakeholders you should involve, who you should include in your project teams, and what deliverables you should strive for.