Not everything that can be counted counts, and not everything that counts can be counted...Albert Einstein.
The data deluge and the concept of critical data elements
The digital age is characterized by large volumes of a variety of data having a universal presence. Organizations have a large number of data entities and data elements pertaining to different subject areas like customer, products, assets, finance, and so on, and a large volume of data corresponding to the same. The data volume grows with each passing day and availability of low cost- high volume storage enables storage of all this data.
With a large number of data elements and the large amount of data stored in repositories and also flowing in and through the organizations’ data pipelines, it’s important to prioritize and isolate the key data elements and manage the quality of these key data elements. This is where the concept of critical data elements comes into picture.
In short, critical data elements are data elements that have a direct or indirect financial impact if the data quality is not up to the mark along one or more data-quality dimensions (Mahanti 2019).
In this article we discuss some key concepts around data, data quality, the importance of quality of critical data and the impact of data on the bottom line.
Some key data concepts
Before we continue, let me explain a few terminologies related to data.
- Data entities are the real-world objects, concepts, events, and phenomena about which we collect data.
- Data elements are the different attributes that describe the data entity.
Thus, a data entity serves as the container that comprises all the data elements that describe it.
Consider a superstore that has many products: soap, milk, butter, detergents, and the like. A “product” would be the data entity representing a product in the store, and the data elements might be product type (e.g., food, dairy, and cleaning), product ID, product name, product description, manufactured date, expiration date, and so forth, which store attribute values for the different products in related data structures ( example relational tables).
Another term is “data-quality dimensions.” This refers to the characteristics that would define the quality of a data element. Referring to the “product” in our example, this would relate to the presence of useful values for each of the data elements in each record of the product data entity, such as timely availability of the data, accuracy of the data, duplicated values, and so on. Data-quality dimensions are what provide an insight into the quality of the data.
Data quality
Data are considered of high quality if they are fit for their intended use. In other words, data quality can be defined as an evaluation of whether those data serve a purpose in a given context. Although data quality is a wholistic abstract concept and cannot be measured as such, it has several dimensions or aspects that can be measured. These measurable aspects are known as data quality dimensions. Some examples of data quality dimensions are completeness (i.e., whether values are present or absent), uniqueness (extent to which the data relating to entity are not duplicated), accuracy (the data values’ closeness to reality), validity (conformance of data values to standards) and timeliness (availability of data in time for so that business needs are met).
In the product example referred to earlier, if our purpose is to track total available units of a particular product in the store, then the product elements of product number, expiration date (where applicable) and number of units available would-be necessary data for that use and would need to be accurate, and complete.
Expiration date might not be applicable for all products. For example, food, dairy products, and cosmetics need to have an expiration date. However, products like cutlery, storage containers and utensils do not have an expiration date and hence the expiration date data element will not have values for these products. The product description is not essential data.
A data-quality dimension for the number of units available data element might be the frequency that data are updated. If this data is updated real time, that would be very useful, and thus quality data.
What data is important?
Given the vast number of data elements and large of volumes of that an organization stores, ensuring the quality of all an organization’s data is an expensive and resource-intensive exercise and one that is not recommended. This is because not all data are critical.
All data are not created equal and hence do not have the same level of importance. Some data elements are critical, and organizations must ensure that they are of high quality, and that they fit their intended use. Some data elements are moderately critical. On the other hand, some data elements might not be of any value and assessing their quality is a waste of time, money, and effort.
For example, many data values are captured and stored for dubious reasons, such as being part of a purchased data model, or retained from a data migration project, but they may not be necessary to achieve any business objectives. Assessing the quality of such data is a waste of time and effort (Mahanti 2019).
Consider a data-profiling exercise that involves measuring the quality of data required for the company’s direct marketing campaign. The question that needs to be answered here is what data does one need to execute a direct marketing campaign? It would essentially require customer contact data, such as names, addresses, email addresses, and so forth. The right data source containing customer contact data and the right data elements — fields holding the customer names, addresses, email addresses — should be selected. However, fields such as those recording comments and job titles are a part of the customer contact data but of no business value for the purposes of executing the market campaign need not be taken into consideration (Mahanti, 2015)
Impact of data on the bottom line
A critical data element can be defined as a data element that supports enterprise obligations or critical business functions or processes, and will cause customer dissatisfaction, pose a compliance risk, or have a direct financial impact if the data quality is not up to the mark along one or more data-quality dimensions (Mahanti 2019).
Customer dissatisfaction and regulatory impact can have an adverse effect on finances. For example, a failure to comply with regulations may cause businesses to pay penalty charges. Disgruntled customers may take their business elsewhere, causing loss of revenue. In general, financial impact may include penalty costs, lost opportunities cost, increase in expenses, or decrease in revenue and profit. Thus, the cost associated with the data element, group of data elements, or data entity with respect to different data quality dimensions can be used to determine criticality (Mahanti 2019).
For example, inaccurate name and address data elements in most customer-centric organizations like financial services, telecommunications, utilities, or retail companies can result in huge mailing costs. Hence, for them, address data are critical.
One way to go about understanding the critical data entities and related data elements is by considering the important enterprise obligations that depend on data quality and mapping the data dependencies, i.e., the critical data entities and associated data elements needed to obtain information for each obligation. Data elements that are critical for one enterprise obligation may not be critical for another enterprise obligation.
Enterprise obligations in a retail company, for example, may include sales reporting and consumer-behavior trend reporting. While customer age, annual income, and occupation might be critical data elements for consumer behavior trend reporting, they are not critical data elements for sales reporting.
On the other hand, there are data elements that might be critical for most enterprise obligations. Enterprise obligations might vary by industry sectors or types of business. The following factors can be used to determine the criticality of data elements:
- Number of enterprise obligations for which the data elements are used;
- Cost associated with the data elements;
- Risks associated with the data elements;
- Number of business units, departments, teams, or business users using the data
In addition to the above, certain data and information are extremely sensitive and can be classified as critical from the perspective of data privacy and security. Reputational damage, litigation costs and fines are some impacts of sensitive data being stolen.
Examples of sensitive data are social security numbers, debit card numbers, credit card numbers, security PIN numbers, pass codes, and passport numbers. Sometimes a data element alone might not be deemed sensitive but becomes sensitive when in a group of data elements. Personally identifiable information is an example this scenario (Mahanti 2019).
Determining and prioritizing critical data elements is one of the first steps that must be carried out before an organization can embark on assessing the quality of its data against the relevant data-quality dimensions that are measurable aspects of data quality. Trying to measure and manage the quality of all data can be an overwhelming and financially infeasible exercise that is bound to fail. Hence, when you think of assessing and improving the quality of data, remember renowned physicist’s Albert Einstein’s comment: “Not everything that can be counted counts, and not everything that counts can be counted.”
To learn more about data quality, including how to measure data quality dimensions, implement methodologies for data quality management, how to build a data quality strategy, and data quality aspects to consider when undertaking data intensive projects, read Data Quality: Dimensions, Measurement, Strategy, Management and Governance (ASQ Quality Press, 2019). This article draws significantly from the research presented in that book and is a modified version of the article “Critical Data Elements and Data Quality” which was first published in QualityDigest.com in March 2020 and later on Medium in 2021
References
1. Mahanti, Rupa. Data Quality: Dimensions, Measurement, Strategy, Management and Governance. ASQ Quality Press, 2019, p. 526. ISBN0873899776 (ISBN13: 9780873899772)
2. Mahanti, Rupa. “Data Profiling Project Selection and Implementation: The Key Considerations.” Software Quality Professional, vol. 17, no. 4, 2015 pp. 44–52.
Fascinating!
Thanks Cari.