I like to think of the data steward as the unsung hero of data. Truth be told is that without them, data scientists wouldn't be able to understand and trust the data that they are using, AI/ML wouldn't output correct results, and a company wouldn't be able to become data-driven.
So who is this unsung hero? What do they do? Let's put the spotlight on them.
To put it simple, a Data Steward is responsible for the maintenance and understanding of data and metadata of an organization. Their overall objective is ensuring quality, compliance, clarity, and understanding of the data that they oversee.
The Data Steward is responsible for the maintenance and understanding of data and metadata of an organization.
This individual comes from the business side and they have experience and knowledge about the data domain that they are assigned to. Though there are different other types of data stewards, the data domain data steward is the most common one to have. That aside, this is that person that you go and ask:
- Do you know where I could find this data that I need?
- Can you please help me explain what this data is all about?
- What does this business term mean?
- How much should I trust the quality of this data?
- Can I use this data for this project?
As I'm calling out these questions I'm sure you can already see the face of that colleague of yours that's able to answer these questions. Maybe it's even yourself. And this is just scratching the surface, of course.
You might also ask yourself, "Wouldn't these answers also come from a tool such as a Business Glossary, or a data catalog, or a data dictionary?" Yes, absolutely, but it's because the data steward helped creating that information and adding it to these tools.
So what are the main responsibilities of a data steward?
There are plenty, but for the most part they can be mentioned in the following 3 categories:
1. Data quality
- Help create data quality requirements, rules, and standards
- Validate and monitor the level of data quality
- Contribute to develop the business rules that govern their data domain (ex: ETL rules)
- Help establish data quality metrics
- Help creating data quality audits, controls, procedures, and policies
- Contribute to helping determine the root cause of data quality issues
2. Metadata management
- Create business metadata. Basically they define business terms and populate the Business Glossary
- Provide context and guidance on the meaning of data
- Promote the use of approved data and metadata definitions and reference data
- Work with data custodians on documenting the technical metadata
- Help with data classification
- Determine the retention, archival, and disposal requirements of data
- Define data security requirements
- Translate regulatory rules into data policies and standards
- Establish guidelines on data usage to ensure data privacy controls are enforced
As I mentioned before, a data steward is usually a subject matter expert from the business side. They are experience and knowledge about the data domain they represent.
I've also seen data stewardship responsibilities assigned to data analysts or data management professionals that have a good understanding of the technical side of things. Ideally though, they are recruited from the ranks of the business as that knowledge and insight that they bring is valuable in everything that they have to do.
Who are your data stewards? What are their main responsibilities?