what is a data domain

Determining your data domains is an important part of your data strategy. So what is a data domain? 

It actually can mean a couple of things, depending if we look at it from the point of view of data management and database management, or if we look at it from the point of view of data governance. Or think of it as looking at it from the technical side or the business side. And you might say, "George why do we care about both? Let's just focus on the data governance side". Well, I think that you need to be aware of both.

Even if you're in data governance, you need to understand the technical side because otherwise when you'll be talking to those technical data stewards and data custodians and IT, they might use the term differently than you. Even when you're talking to vendors, I think it's good to understand both views of the terms. Yes, I know, it's frustrating for the same term to have different meanings, but working in data governance it's something you'll get used to as that's one of the things that data governance will  try to do, clarify these differences.

Data domain (database management)

From a database management point of view, or better yet, a data modeling point of view, a data domain represents the collection of values that a data element may contain. A better way to understand this is through an example. Imagine an online form with a drop down field that we might encounter in a form that we fill in. Let's take that as the gender. 

When we click on that drop-down we might be getting some options, such as the following:

  • Male
  • Female
  • Non-binary
  • Not specified

Of course there could be other options, depending on your definition for gender. That's not the point. The idea is that we would have these fixed options. When we record this in a table of a database, the value assigned to gender can only be one of these 4 values. So we say that the data domain for the gender column is "male", "female", "non-binary", or "not specified".

GENDER_TABLE

MALE

FEMALE

NON-BINARY

NOT SPECIFIED

Data domain (data governance)

From a data governance perspective, data domain means something else. Here, a data domain is "a logical grouping of items of interest to the organization, or areas of interest within the organization".

You can think of data domains as high-level categories of data for the purpose of assigning accountability and responsibility for the data. By the way, a data domain is also called "subject area", or a "data concept" so you might encounter either. Within data governance, they both refer to the same thing. 

Just to note that some are refer to the data domain to mean the same thing as a data set. That's not accurate as a data domain can contain multiple data sets as long as those data sets represent the same area of interest within the organization.

If this is still clear as mud, let's look at some examples. 

Data domain examples

  • Customer
  • Product (or Service)
  •  Location
  • Vendor (or Supplier)
  • Transaction (or Order, or Sale)
  • Legal

An average organization would have anywhere between 5-10 AND they aren't always these, though these are usually the most common ones. In the end it really depends on the industry that you're part of.

Let's look at some industry specific data domains. 

In the education sector, you might have:

  • Student
  • Research
  • Faculty
  • Alumni
  • Advancement

In the healthcare sector, you might have:

  • Patient
  • Facility
  • Medical procedure

In the insurance sector, you might encounter:

  • Provider
  • Member

In any of these sectors you could also have some of the previous data domains as well. So for example, I'm sure that all 3 sectors would all have "Location", "Transaction", and "Legal" as data domains.

Data sub-domain

There's also the concept of a data sub-domain. Typically each data domain will have anywhere between 3 to 10 data sub-domains.

What is a sub-domain? It's simply a way to divide that data domain even further into other categories.

There are some considerations, though:

  • The sub-domain is unique
  • There's a 1 to 1 relationship between these data domain and data sub-domain
  • It inherits the characteristics

Data sub-domain examples

Let me provide you with some sub-domain examples to some of the data domains mentioned above.

Customer

  • Individual
  • Corporation
  • Government
  • Charity
  • Group
  • Household

Vendor

  • Vendor specification
  • Pricing
  • Service level agreement

Location

  • Site
  • Geographical area
  • Building
  • Office
  • Warehouse
  • Outdoor space

Conclusion

What you should remember is that these data domains, and sub-domains, are a way of grouping the most important data of an organization and they go across business units & systems. So for the same domain you might have different stakeholders from different lines of business and departments and the data can be found in different systems, can be produced by different systems, or consumed by different systems.

That being said, the reality can also be a bit more complicated and when data doesn’t perfectly slot into one subject area or another, data can be associated with more than one domain. This is not a recommended approach, but it sometimes is unavoidable.


  • Thanks George for your explanation. I wonder if metrics/KPIs is a data domain on its own or fall under different data domains?

  • Thank you so much for a very clear explanation!

  • ok, but what is is actually used for? A table of customer names is part of the customer domain. A list of products we sell is part of the product domain. So what? What do we do with that information?

    • A lot of things. In data governance it’s used as a way to categorize different processes, standards, policies, and assign data stewards and data owners. It’s a great way to make sense of the vastness of an organization’s data and help set its focus for its data governance efforts.

  • David Jaques-Watson says:

    One more example: I’ve used Group in the past in an emergency management system. The situation was if an emergency occurred with an overseas tour group (holidaymakers travelling on a bus which crashes, for example) the *only* thing linking these people together is they are part of the tour group! So, I added Group as a subtype of Party (along with the usual suspects of Person and Organisation).

  • David Jaques-Watson says:

    Hi George! I have seen one other peculiarly technical use of “domain” in relation to data: erwin DM uses “Domain” to refer to an object in a data model which can be used to pass properties (like datatype, length, etc.) to an attribute or column.

    For example, if all your ID fields are 10 digits long, you can create “ID” as a domain object with a logical datatype of NUMERIC(10,0). Then for every ID field you create, you set it’s “Parent Domain” to ID and its logical datatype will also be set to NUMERIC(10,0).

    (Yet another example of how use of the same term for different things is one of the banes of data management!)

  • Would you provide an example where there is not a 1 to 1 relationship between the data domain and data sub-domain?

    • There should always be a 1 to 1 relationship, but for example, you could have a “Marketing” sub-domain which would encompass marketing related data (ex: web analytics, different types of conversion rates and reach, marketing segments, and even a bunch of data analytical scores). This could be a sub-domain of Customer or Product. It could go under either.

      • Hi George, thanks for this….but how does this relate with company departments and business areas?

        • Hi Emeka, you usually have a matrix with the domain names as rows and departments/ business areas as columns. You can use it to guide you in seeing what stakeholders need to be included for each.

  • Thanks for this George, as always very clearly and simply explained so easy to understand. Love your YouTube channel as well.

  • Very clear and useful. Than you!!

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    About the author 

    George Firican

    George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

    You may also like:

    How to Become a Data Science Freelancer

    George Firican

    12/19/2023

    Data Governance in 2024

    Data Governance in 2024
    5 Steps to Achieve Proactive Data Observability – Explained Over Beers
    >