Determining your data domains is an important part of your data strategy. So what is a data domain?
It actually can mean a couple of things, depending if we look at it from the point of view of data management and database management, or if we look at it from the point of view of data governance. Or think of it as looking at it from the technical side or the business side. And you might say, "George why do we care about both? Let's just focus on the data governance side". Well, I think that you need to be aware of both.
Even if you're in data governance, you need to understand the technical side because otherwise when you'll be talking to those technical data stewards and data custodians and IT, they might use the term differently than you. Even when you're talking to vendors, I think it's good to understand both views of the terms. Yes, I know, it's frustrating for the same term to have different meanings, but working in data governance it's something you'll get used to as that's one of the things that data governance will try to do, clarify these differences.
Data domain (database management)
From a database management point of view, or better yet, a data modeling point of view, a data domain represents the collection of values that a data element may contain. A better way to understand this is through an example. Imagine an online form with a drop down field that we might encounter in a form that we fill in. Let's take that as the gender.
When we click on that drop-down we might be getting some options, such as the following:
- Not specified
Of course there could be other options, depending on your definition for gender. That's not the point. The idea is that we would have these fixed options. When we record this in a table of a database, the value assigned to gender can only be one of these 4 values. So we say that the data domain for the gender column is "male", "female", "non-binary", or "not specified".
Data domain (data governance)
From a data governance perspective, data domain means something else. Here, a data domain is "a logical grouping of items of interest to the organization, or areas of interest within the organization".
You can think of data domains as high-level categories of data for the purpose of assigning accountability and responsibility for the data. By the way, a data domain is also called "subject area", or a "data concept" so you might encounter either. Within data governance, they both refer to the same thing.
Just to note that some are refer to the data domain to mean the same thing as a data set. That's not accurate as a data domain can contain multiple data sets as long as those data sets represent the same area of interest within the organization.
If this is still clear as mud, let's look at some examples.
Data domain examples
- Product (or Service)
- Vendor (or Supplier)
- Transaction (or Order, or Sale)
An average organization would have anywhere between 5-10 AND they aren't always these, though these are usually the most common ones. In the end it really depends on the industry that you're part of.
Let's look at some industry specific data domains.
In the education sector, you might have:
In the healthcare sector, you might have:
- Medical procedure
In the insurance sector, you might encounter:
In any of these sectors you could also have some of the previous data domains as well. So for example, I'm sure that all 3 sectors would all have "Location", "Transaction", and "Legal" as data domains.
There's also the concept of a data sub-domain. Typically each data domain will have anywhere between 3 to 10 data sub-domains.
What is a sub-domain? It's simply a way to divide that data domain even further into other categories.
There are some considerations, though:
- The sub-domain is unique
- There's a 1 to 1 relationship between these data domain and data sub-domain
- It inherits the characteristics
Data sub-domain examples
Let me provide you with some sub-domain examples to some of the data domains mentioned above.
- Vendor specification
- Service level agreement
- Geographical area
- Outdoor space
What you should remember is that these data domains, and sub-domains, are a way of grouping the most important data of an organization and they go across business units & systems. So for the same domain you might have different stakeholders from different lines of business and departments and the data can be found in different systems, can be produced by different systems, or consumed by different systems.
That being said, the reality can also be a bit more complicated and when data doesn’t perfectly slot into one subject area or another, data can be associated with more than one domain. This is not a recommended approach, but it sometimes is unavoidable.