How to get started with Data Governance when the odds are against you?
In the past few years I have been asked to introduce data governance at several clients. It became obvious there is not a magical method that would allow you to be successful in this task. The approach you should take, depends a lot on how your client is organized at the moment they experience their need of more governance.
Many organizations sense that they need to get organized in a better way and follow the speeches on “Data is the new oil” and “don’t make a data swamp out of your data lake”, but they often do not grasp the essence of what data governance is actually about. Business people often think they shouldn’t be involved, as data is considered a technical problem; technical people often think they are doing fine, as performance is great and they see the results of their tests as successful even if they sometimes lack the insight in the true meaning of a business term.
When I got a new assignment at one of our clients, I took my preferred approach to oppose a bottom-up documentation against a top-down validation. Typically this allows you to confront business and IT with what has been produced in the past few years and how requirements didn’t sip through efficiently. Next to this, my internal colleague and I worked out some standard deliverables that would serve the safekeeping of data assets.
This time however the formula I tested many times before didn’t work out as expected, as neither business teams nor IT teams were apparently interested in my activities. They were happy with the silos that existed and couldn’t care about a common understanding or the required data deliverables. This wasn’t true for the entire organization though and didn’t help the teams that worked transversally. Although of great importance, the needs of these transversal teams were ignored by the majority of the organization.
I was puzzled by this situation; how could we break this status quo and move towards a data-driven organization as required by the mission statement?
This meant I had to start from scratch. Failing the first part of the mission did teach me a lot on what the client was actually looking for and how people reflected. I put all elements together and reflected together with my internal colleague on how we could move forward.
My internal colleague had a more strategic insight into the organization. He only recently took up the position and discovered from his perspective things weren’t quite as efficient as he expected them to be.
Given the above, we decided to switch gear and to take a different approach. My colleague dived into the organization’s strategic documentation and I looked into how teams could be optimized in order to guarantee better data governance. This lead to a presentation in which we could paint different steps on the short, medium and long term in order to achieve the data centricity management was keen on.
The very first step had to be the evangelization of the need governance of enterprise IT, including data governance, to the higher and the middle management. Although a vision was defined in the mission statement, key stakeholders had been replaced and the newly assigned C-level wasn’t necessarily aligned with the vision that was produced by there predecessors; the level below was yet to be convinced of its use.
Building on this first evangelization, we argued that an introduction of a more formalized demand management within the business organization or on the business side of IT would be a great asset, as development was often externalized and little or no quality control on the actual delivery was put in place.
Next, the reasoning continued in the sense that with this demand management organization we could put in place the ideas of a Data Mesh [1], in which data products can be developed independently (thus respecting the existing silos in the organization), but also must reply to the policies that are put in place by the organization (that would come from the newly established demand organization).
Finally, we introduced the ideas of ‘Service Level Agreements’ [2] (identifying the needs of the business), ‘Service Level Objectives’ [3] (identifying the criteria deliveries should reply to) and ‘Service Level Indicators’ [4] (identifying the actual metrics that proof a Service Level Objective has been met). Within these SLA/SLO/SLI that go wider than data management only, we could easily fit in Business Rules (which would go into SLA/SLO) and data quality rules (which are actually real SLO/SLI).
The above widened the scope of our project beyond data management, which made it more acceptable for the strongly process-oriented organization. This is where we are right now.
In the near future, we plan to continue this track and introduce data observability into the operational runtime, especially at the level of cross-functional APIs. This includes the measurement of the following aspects [5].
- Freshness: how up-to-date is your data?
- Distribution: does your data fall within an accepted range?
- Volume: is your data complete?
- Schema: has the structure of your data changed?
- Lineage: what are the upstream and downstream impacts of data downtime?
Doing so, we would establish another aspect of the Data Mesh theory, namely the creation of what is called “Data Reliability Engineering” [6] (DRE), which would form a great base for better data governance. DRE would be able to identify the SLIs that would specify certain objectives that are defined by the Data Custodian.
In an organization in which IT is mostly done by third parties this would provide a way to regain control of whatever is being delivered from within the organization. For my client this would mean that they would have a great tool to transform their siloed organization into an innovative data driven organization without changing much of their current way of working.
Talking about a new oil or a swamp that didn’t relate to what they were doing didn’t help them; pointing out the weaknesses in the organization with regard to quality as a whole and how data governance could play a major role did. It show cased also what really meant to think big and start small: the first baby steps you need to take may even not relate directly to data management itself, but should lead to an improved context in which data management and data governance likewise can be accepted as one of the capabilities to assure the quality of the solutions that are being built. This is definitely something I’ll take with me for the rest of my career.
References:
[1] The theory on data mesh is founded by Zhamak Dehghani and published on Data Mesh Principles and Logical Architecture (martinfowler.com)
[1] See “Service Level Agreement (SLA) Examples and Templates” by Muhammad Raza (Service Level Agreement (SLA) Examples and Template – BMC Software | Blogs)
[1] See “Service Level Objectives (SLOs) Explained” by Muhammad Raza (Service Level Objectives (SLOs) Explained – BMC Software | Blogs)
[1] See “A Primer on Service Level Indicator (SLI) Metrics” by Stephen Watts (A Primer on Service Level Indicator (SLI) Metrics – BMC Software | Blogs)
[1] See “The Big Book of Data Observability” by Monte Carlo (https://resources.montecarlodata.com/resources/the-big-book-of-data-observability-ebook)
[1] See “How Data Reliability can Solve Today’s Data Challenges” on DataVersity (How Data Reliability Engineering Can Solve Today’s Data Challenges - DATAVERSITY)