Deming’s advice to busy executives: “Reduce variation”

One of my favourite quality stories is when, just as W. Edwards Deming was about to begin his 4-day seminar on improving quality at a large enterprise, one of the executives scheduled to take the course came up to Deming and said that he was so busy that week that he could not possibly spend 4 days at the seminar. So, he asked Deming to distill the main message of the seminar to a few words so that the executive didn’t have to ‘waste’ 4 days. In the story, Deming smiled and said, “You should focus on reducing variation”.

focus on reducing variation

The amazing thing to me about this story is that those 2 words do such a great job of distilling the ideas behind what Deming called ‘profound knowledge‘ and today that ‘reduce variation‘ advice is still as applicable to the manufacture of cars on an assembly line as it is to the creation and management of policyholder records in an insurance company’s database. Specifically, Deming would tell the car manufacturer to reduce variations in all the parts and assembly processes that go into making a car that customers would want to buy, just as he would tell a life insurance executive to measure, analyze, improve and control the data and processes that are involved in the creation of a life insurance policy record in a database.

When I teach my 6-week course on Data Quality Improvement at BCIT, I try to get the students thinking about how the processes involved in the production of data can be continuously improved. We’ve only got 6 weeks so the focus is on teaching them how to use a simple data profiling tool to quickly expose possible data quality issues and evolve the DQ issues into Data Quality Rules (DQRs) using simple Excel charts such as Statistical Process Control and Process Behaviour Charts to show how the data quality issues vary over time.

mask analysis — Patterns in credit card numbers

The Data Profile of a data set from a Business Process provides the characteristics of each column in the data set, as well as the ‘behaviour’ of that column’s data over time. For each column in the data set, the profile illustrates basic quality characteristics such as average, variance, patterns, masks, distributions, outliers, et cetera. Using these, the DQ Analyst can prepare a short list of ‘interesting characteristics’ about the data set and ask a subject matter expert, i.e. a person familiar with the business process and its outputs, to explain the significance, and priority, of each Poor Data Quality characteristic (PDQ). For example, this image shows that the Credit Card numbers, when entered, are consistently 16 digits except for 5 incorrect patterns.

With the prioritization(s) in hand, the DQ Analyst can evolve the significant DQ issues into DQ Rules by analyzing the occurrence of the DQ issue over time to calculate the average number of occurrences per period and the variation of the occurrences over time. The average and variation define the upper and lower occurrence thresholds, and when the DQ Rule Violation counts are charted over time, the graph almost magically shows what Donald Wheeler calls the ‘Voice of the Process‘.

DQR statistical process control — Statistical Profess Control – No Address DQ Violations

Here, the DQR#1723 is run against the data at the end of each month and the number of ‘no address’ DQR Violations is recorded. In this chart you can see that there has been an average of 165 customer records without an address created each month and though there was a major variation in March of 2014, overall this DQ issue is stable.

Since the past is the best predictor of the future, a person wanting to understand how many customer records would be entered into the system without addresses next month could safely estimate 165 plus or minus 15. Now that they are aware, the organization can decide that the ‘no address’ issue is significant and begin to continuously improve the upstream processes to ensure customer addresses are entered. In this chart of 2015-16 we can see that the number of ‘no address’ records on average has dropped to 140 per month, and there has been a steady improvement until the last month of each year.

To quote Donald Wheeler:

“The characterization of a process as either predictable or unpredictable is a fundamental dichotomy for data analysis”.

Simply put, if the process is not predictable, how can you manage it. Once you measure the baseline variability of the dataset generated by the business process, you can go further and measure the variability of the DQ characteristics that have a significant impact on the integrity and usability of the business process information. With the added depth of baseline measurements of the ‘noise’ of the DQ Rule violations, your organization’s decision making will steadily improve.

Share0

Tweet0

About the author

Gordon Hamilton

Gordon Hamilton is an unapologetic Data Enthusiast, with years of experience integrating data quality into the data warehouse development process, modeling dimensional and 3NF data models, leading data migrations, analyzing data and helping customers find the signal in the noise. In his downtime, Gordon teaches Data Quality Improvement at BCIT, helps his former students ease Data Governance into their organizations and supports the new DAMA Vancouver Chapter. GHamilton@DataQuality.ca

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
tve_leads_unique	1 month	This cookie is set by the provider Thrive Themes. This cookie is used to know which optin form the visitor has filled out when subscribing a newsletter.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_1Z635JPV9L	2 years	This cookie is installed by Google Analytics.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AE_AB_COOKIE	1 year	No description
DEVICE_INFO	5 months 27 days	No description
loglevel	never	No description available.
tl_4829_4830_26	1 month	No description
tl_4829_4840_30	1 month	No description
tl_4829_4941_41	1 month	No description
tve_secret	1 year	No description available.

Deming’s advice to busy executives: “Reduce variation”

Gordon Hamilton

Human in the Loop AI: Why It’s Often Just a Checkbox

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling

Optimizing retail operations through a practical data strategy

You may also like:

Deming’s advice to busy executives: “Reduce variation”