What you need to know about regression testing on DW/ BI projects (part 2)

regression-testing-for-dw-bi-projects-part2

In the first article of this series on DW/BI regression testing, DW/BI regression test planning was defined accordingly:

Testing performed after developing functional improvements or repairs to data and reports. Its purpose is to determine if changes have regressed other attributes of data and reports.
A repetition of tests intended to show that the software’s overall behavior is unchanged except as required by adjustments to the software or data
Testing conducted for the purpose of evaluating whether specific changes to the system has introduced new defects

In this article, we delve deeper into a process for selecting regression test cases to be run after changes to DW ETL code or data.

A regression test selection process may be considered effective and efficient when its objective is to select, from the original/current test suite, test cases that can reveal defects in new or modified data, ETL’s, or report generating programs.

Test case prioritization involves selecting test cases that reveal maximum defects in the components of the software and assigning high significances to them.

Steps to Develop Priorities for DW/BI Regression Tests

Change identification is the first step in change impact analysis. We differentiate between two types of changes in the database applications environment:

Code Changes: Involves changes made to the code of the database modules (ex., ETL’s)

Database Component Changes: Involves changes made to the definition of the database components or actual data

Questions to Consider When Planning DW/BI Regression Tests

What changes in requirements and code were applied according to specifications
Which ETL processes were changed and what are the new or changed logic that was implemented?
Which stored procedures and views were changed or added?
Which business rules were changed and applied to ETL logic?
Which tables, text files, views, and related fields were changed, deleted, or added?
Which table relationships were changed (primary, foreign keys, natural keys)?
What data source to target mappings were changed and why?

Regression testing is important for data load processes whether ETL’s were developed through tools such as Informatica and DataStage or user-developed stored procedures.

Planning for ETL regression tests, testers must understand how tables relate to each other (ex., through an examination of data models) and use the knowledge, along with user specifications, to accurately determine which data warehouse data should be identical or changed across ETL test runs. Effective test tools (or manual processes) allow for quickly detecting and displaying differences between the new ETL results and the reference results from an earlier date.

Even when the test data is fixed in the input sources for the ETL, some factors may change. For example, the order of fetched data rows may vary in the relational model. Additionally, attributes obtained from sequences may have different values in separate runs. However, actual values for surrogate keys assigned values from sequences are not interesting, whereas it indeed is interesting how rows are “connected” with respect to primary key/foreign key pairs.

Conclusion

Software re-validation involves essentially four issues: change impact identification, test suite maintenance, test strategy, and test case selection. In database applications, a number of features unique to data is supported such as SQL statements, table constraints, exception programming, and table triggers. These features introduce new difficulties that hinder regression test selection.

DW/BI regression testing is an important activity for software development and software maintenance. Such testing ensures that modified software continues to satisfy its intended requirements after changes, additions, and deletions. If not done properly, regression testing can be an unnecessarily expensive process in an attempt to revalidate modified software and data introduced into previously tested code.

Share0

Tweet0

About the author

Wayne Yaddow

Wayne Yaddow is an independent consultant with more than 20 years’ experience leading data integration, data warehouse, and ETL testing projects with J.P. Morgan Chase, Credit Suisse, Standard and Poor’s, AIG, Oppenheimer Funds, and IBM. He taught IIST (International Institute of Software Testing) courses on data warehouse and ETL testing and wrote DW/BI articles for Better Software, The Data Warehouse Institute (TDWI), Tricentis, and others. Wayne continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact him at wyaddow@gmail.com.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
tve_leads_unique	1 month	This cookie is set by the provider Thrive Themes. This cookie is used to know which optin form the visitor has filled out when subscribing a newsletter.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_1Z635JPV9L	2 years	This cookie is installed by Google Analytics.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AE_AB_COOKIE	1 year	No description
DEVICE_INFO	5 months 27 days	No description
loglevel	never	No description available.
tl_4829_4830_26	1 month	No description
tl_4829_4840_30	1 month	No description
tl_4829_4941_41	1 month	No description
tve_secret	1 year	No description available.

What you need to know about regression testing on DW/ BI projects (part 2)

Steps to Develop Priorities for DW/BI Regression Tests

Questions to Consider When Planning DW/BI Regression Tests

Conclusion

Wayne Yaddow

Human in the Loop AI: Why It’s Often Just a Checkbox

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling

Optimizing retail operations through a practical data strategy

You may also like:

Human in the Loop AI: Why It’s Often Just a Checkbox

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance