main responsibilities of a data steward

I like to think of the data steward as the unsung hero of data. Truth be told is that without them, data scientists wouldn't be able to understand and trust the data that they are using, AI/ML wouldn't output correct results, and a company wouldn't be able to become data-driven.

 So who is this unsung hero? What do they do? Let's put the spotlight on them.

To put it simple, a Data Steward is responsible for the maintenance and understanding of data and metadata of an organization. Their overall objective is ensuring quality, compliance, clarity, and understanding of the data that they oversee.

The Data Steward is responsible for the maintenance and understanding of data and metadata of an organization.

 This individual comes from the business side and they have experience and knowledge about the data domain that they are assigned to. Though there are different other types of data stewards, the data domain data steward is the most common one to have. That aside, this is that person that you go and ask:

  • Do you know where I could find this data that I need?
  • Can you please help me explain what this data is all about?
  • What does this business term mean?
  • How much should I trust the quality of this data?
  • Can I use this data for this project?

 As I'm calling out these questions I'm sure you can already see the face of that colleague of yours that's able to answer these questions. Maybe it's even yourself. And this is just scratching the surface, of course.

You might also ask yourself, "Wouldn't these answers also come from a tool such as a Business Glossary, or a data catalog, or a data dictionary?" Yes, absolutely, but it's because the data steward helped creating that information and adding it to these tools.


So what are the main responsibilities of a data steward?

There are plenty, but for the most part they can be mentioned in the following 3 categories:

1. Data quality

  • Help create data quality requirements, rules, and standards
  • Validate and monitor the level of data quality
  • Contribute to develop the business rules that govern their data domain (ex: ETL rules)
  • Help establish data quality metrics
  • Help creating data quality audits, controls, procedures, and policies
  • Contribute to helping determine the root cause of data quality issues

 2. Metadata management

  • Create business metadata. Basically they define business terms and populate the Business Glossary
  • Provide context and guidance on the meaning of data
  • Promote the use of approved data and metadata definitions and reference data
  • Work with data custodians on documenting the technical metadata

3. Regulatory

  • Help with data classification
  • Determine the retention, archival, and disposal requirements of data
  • Define data security requirements
  • Translate regulatory rules into data policies and standards
  • Establish guidelines on data usage to ensure data privacy controls are enforced

Conclusion

As I mentioned before, a data steward is usually a subject matter expert from the business side. They are experience and knowledge about the data domain they represent.

I've also seen data stewardship responsibilities assigned to data analysts or data management professionals that have a good understanding of the technical side of things. Ideally though, they are recruited from the ranks of the business as that knowledge and insight that they bring is valuable in everything that they have to do.

Who are your data stewards? What are their main responsibilities?

  • Hi George, thanks a lot for this great Data Steward’s description ! My question is : Depends of the data lifecycle how the Data Team is organized and what are the main responsabibilities of each Data Roles (Data steward, Data Analyst, Data Scientist, Data Owner, Data engineer, Data Custodian…) Do you have a big picture to illustrate the organization of the Data Team and where each Data Roles are doing their work in the data life cycle ?

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    About the author 

    George Firican

    George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

    You may also like:

    How to Become a Data Science Freelancer

    George Firican

    12/19/2023

    Data Governance in 2024

    Data Governance in 2024
    5 Steps to Achieve Proactive Data Observability – Explained Over Beers
    >