What data quality testing skills

The impulse to cut project costs is often strong, especially in the final delivery phase of data integration and data migration projects. At this late phase of the project, a common mistake is to delegate testing responsibilities to resources with limited business and data testing skills.

Data integrations are at the core of data warehousing, data migration, data synchronization, and data consolidation projects.

In the past, most data integration projects involved data stored in databases. Today, it’s essential for organizations to also integrate their database or structured data with data from documents, e-mails, log files, websites, social media, audio, and video files.

Using data warehousing as an example, Figure 1 illustrates the primary checkpoints (testing points) in an end-to-end data quality testing process. Shown are points at which data (as it’s extracted, transformed, aggregated, consolidated, etc.) should be verified – that is, extracting source data, transforming source data for loads into target databases, aggregating data for loads into data marts, and more.

Only after data owners and all other stakeholders confirm that a data integration was successful, can the whole process be considered complete and ready for production.

checkpoints for for data integration projects
Figure 1: Checkpoints that are necessary to audit and verify data quality in data integration projects. A data warehouse data integration example is shown in this figure. (©Tricentis)

The skills and experiences of project testers is paramount for project success

The data integration/migration testing lead and other hands-on testers are expected to demonstrate extensive experience in their ability to plan, design, and execute data integration source and target testing – strategies and tactics to ensure data quality throughout all stages of a data extract, transform, and extract (ETL) life cycle.

In recent years, there has been an evolving trend toward business analysts, ETL developers, and even business users to plan and conduct data integration, data migration, and data warehouse testing. But doing so may be risky.

Among the required skills for data integration testers are the following:

  • A firm understanding of data warehouse and database concepts
  • The ability to develop strategies, test plans, and test cases specific to data integration and the enterprise’s business
  • The ability to create effective ETL test cases and scenarios based on ETL database loading technology and business requirements
  • Advanced skills with SQL queries, stored procedures, and test scripting to delete, define, load, and merge data for tests
  • In-depth understanding of ETL development tools
  • Advanced knowledge of project business data and metadata (data sources, data tables, data dictionary, business terminology)
  • Innovative skills with data profiling using associated methods and tools
  • Understanding of data models, data mapping documents, ETL design, and ETL coding
  • Ability to communicate effectively with data engineers, DB designers and developers
  • Testing experiences with multiple DB systems, (e.g., Oracle, SQL Server, DB2, Postgresql)
  • Troubleshooting of ETL (e.g., Informatica/DataStage) sessions, workflows, and logs
  • Ability to deploy database code to test environments
  • Use of Microsoft Excel, and a variety of data quality verification tools
  • Implementation of automated testing for ETL processes
  • Effective use of defect management and tools

Conclusion

Data integration projects can fail for many reasons: Poor data architecture, inconsistently defined data, inability to combine data from different data sources, missing and inaccurate data values, inconsistent use of data fields, unacceptable query performance, and so forth.

These project risks can be diminished with well-trained and motivated testers who provide ongoing support from the earliest phases of the data integration development. We hope that this article helps you take the first steps towards that end.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

About the author 

Wayne Yaddow

Wayne Yaddow is an independent consultant with more than 20 years’ experience leading data integration, data warehouse, and ETL testing projects with J.P. Morgan Chase, Credit Suisse, Standard and Poor’s, AIG, Oppenheimer Funds, and IBM. He taught IIST (International Institute of Software Testing) courses on data warehouse and ETL testing and wrote DW/BI articles for Better Software, The Data Warehouse Institute (TDWI), Tricentis, and others. Wayne continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact him at wyaddow@gmail.com.

You may also like:

>