In the first article of this series on DW/BI regression testing, DW/BI regression test planning was defined accordingly:
- Testing performed after developing functional improvements or repairs to data and reports. Its purpose is to determine if changes have regressed other attributes of data and reports.
- A repetition of tests intended to show that the software’s overall behavior is unchanged except as required by adjustments to the software or data
- Testing conducted for the purpose of evaluating whether specific changes to the system has introduced new defects
In this article, we delve deeper into a process for selecting regression test cases to be run after changes to DW ETL code or data.
A regression test selection process may be considered effective and efficient when its objective is to select, from the original/current test suite, test cases that can reveal defects in new or modified data, ETL’s, or report generating programs.
Test case prioritization involves selecting test cases that reveal maximum defects in the components of the software and assigning high significances to them.
Steps to Develop Priorities for DW/BI Regression Tests
Change identification is the first step in change impact analysis. We differentiate between two types of changes in the database applications environment:
Code Changes: Involves changes made to the code of the database modules (ex., ETL’s)
Database Component Changes: Involves changes made to the definition of the database components or actual data
Questions to Consider When Planning DW/BI Regression Tests
- What changes in requirements and code were applied according to specifications
- Which ETL processes were changed and what are the new or changed logic that was implemented?
- Which stored procedures and views were changed or added?
- Which business rules were changed and applied to ETL logic?
- Which tables, text files, views, and related fields were changed, deleted, or added?
- Which table relationships were changed (primary, foreign keys, natural keys)?
- What data source to target mappings were changed and why?
Regression testing is important for data load processes whether ETL’s were developed through tools such as Informatica and DataStage or user-developed stored procedures.
Planning for ETL regression tests, testers must understand how tables relate to each other (ex., through an examination of data models) and use the knowledge, along with user specifications, to accurately determine which data warehouse data should be identical or changed across ETL test runs. Effective test tools (or manual processes) allow for quickly detecting and displaying differences between the new ETL results and the reference results from an earlier date.
Even when the test data is fixed in the input sources for the ETL, some factors may change. For example, the order of fetched data rows may vary in the relational model. Additionally, attributes obtained from sequences may have different values in separate runs. However, actual values for surrogate keys assigned values from sequences are not interesting, whereas it indeed is interesting how rows are “connected” with respect to primary key/foreign key pairs.
Conclusion
Software re-validation involves essentially four issues: change impact identification, test suite maintenance, test strategy, and test case selection. In database applications, a number of features unique to data is supported such as SQL statements, table constraints, exception programming, and table triggers. These features introduce new difficulties that hinder regression test selection.
DW/BI regression testing is an important activity for software development and software maintenance. Such testing ensures that modified software continues to satisfy its intended requirements after changes, additions, and deletions. If not done properly, regression testing can be an unnecessarily expensive process in an attempt to revalidate modified software and data introduced into previously tested code.