What you need to know about regression testing on DW/ BI projects

Regression testing on large data integration DW/BI development projects is challenging. The number of test cases is often massive and many change impacts may be widely spread. For high-level integration regression testing, the retest-all approach is always time and resource consuming.

To counter challenges with regression testing, this article proposes test scenarios based on changes to ETL logic and data. Selected test scenarios should be semi-formal representations of detailed system requirements with their test inputs, outputs, and conditions defined. By using test dependency information, the QA team can use a test slicing algorithm that identifies the scenarios that are affected and thus are candidates for regression testing.

What is regression testing?

Following are three ways of understanding regression testing:

Testing performed after developing functional improvements or repairs to data and BI reports. The purpose of those tests is to establish if changes haves regressed other attributes of data and reports.
A series of tests intended to show that the software’s overall behavior is unchanged except as required by adjustments to the software or data.
Testing conducted for the purpose of evaluating whether specific changes to the system have introduced new failures.

Figure 1 shows domains of regression testing to be considered after changes to source data, DW data, ETL’s, business logic, and business intelligence reports.

Figure 1: Common testing domains across the DW/BI project lifecycle

Common strategies for selecting regression test suites

High priority and high-risk use cases. Choose baseline tests to rerun by risk heuristics – those with the most risk to data, reports, or dashboards when failing.
End-to-end operational profiles. Choose baseline tests to rerun by allocating time/QA resources in proportion to operational profile risks (source extraction, data staging, data mart loads, etc.).
Business logic and/or data changes. Choose baseline tests to rerun after assessing changes to code and data.
Select from existing test cases. Choose baseline tests for regression testing by analyzing dependencies and relationships with changed or added code.

Recommended strategies for DW/BI regression test planning

Combine the four common strategies from above: Any one of the existing regression testing strategies above may be fine for your project, but in the real world, a combination of the four strategies, as described below, may be a better choice.
It is assumed that first we test each change (fix) by running all related test cases. Then for regression testing:

30% of regression tests: Tests representing the riskiest functions and data; in particular, those identified to be affected by changes. Among components for high priority, consider business risk and frequency of using the changed scenario by users.
50% of regression tests: Run all tests planned for general regression testing.
20% of regression tests: Exploratory Testing – Remember to properly document the results of exploratory testing. For those who not like the idea of ‘exploratory testing’, use the time in the schedule to improve your understanding of the requirements, the system, and your logical and architectural coverage of application by test cases then plan for exploratory testing.

Conventional allocations are 30%; 50% and 20%; other ratios of regression tests may work better for your project (see Figure 2).

Percentage of regression tests

30%

Risky Changes/ Fixes

50%

Planned Regression Tests

20%

Exploratory Tests

Figure 2: Percentages of testing resources and time allocated to the three regression test strategies.

The selection of test cases for regression testing

Requires knowledge of logic and data changes/or bug fixes and how the software may be affected
Includes the data and business logic areas of frequent defects
Includes the areas which have undergone many and/or recent code changes
Includes the domains which are highly visible to users
Includes the core features of the product which are mandatory requirements by users

Conclusion

By using requirements traceability information, one can uncover affected components and their associated test scenarios and test cases for regression testing.

With information about dependencies and traceability, one can use a flow-affect analysis to identify all potentially affected logic and data, (directly or indirectly), scenarios, and thus a set of test cases can be selected for regression testing. Checkout Part 2 of this article to learn more about the process for selecting regression test cases to be run after changes to DW ETL code or data.

Share0

Tweet0

About the author

Wayne Yaddow

Wayne Yaddow is an independent consultant with more than 20 years’ experience leading data integration, data warehouse, and ETL testing projects with J.P. Morgan Chase, Credit Suisse, Standard and Poor’s, AIG, Oppenheimer Funds, and IBM. He taught IIST (International Institute of Software Testing) courses on data warehouse and ETL testing and wrote DW/BI articles for Better Software, The Data Warehouse Institute (TDWI), Tricentis, and others. Wayne continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact him at wyaddow@gmail.com.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
tve_leads_unique	1 month	This cookie is set by the provider Thrive Themes. This cookie is used to know which optin form the visitor has filled out when subscribing a newsletter.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_1Z635JPV9L	2 years	This cookie is installed by Google Analytics.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AE_AB_COOKIE	1 year	No description
DEVICE_INFO	5 months 27 days	No description
loglevel	never	No description available.
tl_4829_4830_26	1 month	No description
tl_4829_4840_30	1 month	No description
tl_4829_4941_41	1 month	No description
tve_secret	1 year	No description available.

What you need to know about regression testing on DW/ BI projects

What is regression testing?

Common strategies for selecting regression test suites

Recommended strategies for DW/BI regression test planning

Percentage of regression tests

The selection of test cases for regression testing

Conclusion

Wayne Yaddow

Human in the Loop AI: Why It’s Often Just a Checkbox

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling

Optimizing retail operations through a practical data strategy

You may also like:

What data quality testing skills are needed for data integration projects?

Managing DW/ BI data integration risks through data reconciliation and data lineage processes

Main considerations for testing BI reports