think big start small in data governance

How to get started with Data Governance when the odds are against you?

In the past few years I have been asked to introduce data governance at several clients. It became obvious there is not a magical method that would allow you to be successful in this task. The approach you should take, depends a lot on how your client is organized at the moment they experience their need of more governance.

Many organizations sense that they need to get organized in a better way and follow the speeches on “Data is the new oil” and “don’t make a data swamp out of your data lake”, but they often do not grasp the essence of what data governance is actually about. Business people often think they shouldn’t be involved, as data is considered a technical problem; technical people often think they are doing fine, as performance is great and they see the results of their tests as successful even if they sometimes lack the insight in the true meaning of a business term.

When I got a new assignment at one of our clients, I took my preferred approach to oppose a bottom-up documentation against a top-down validation. Typically this allows you to confront business and IT with what has been produced in the past few years and how requirements didn’t sip through efficiently. Next to this, my internal colleague and I worked out some standard deliverables that would serve the safekeeping of data assets.

This time however the formula I tested many times before didn’t work out as expected, as neither business teams nor IT teams were apparently interested in my activities. They were happy with the silos that existed and couldn’t care about a common understanding or the required data deliverables. This wasn’t true for the entire organization though and didn’t help the teams that worked transversally. Although of great importance, the needs of these transversal teams were ignored by the majority of the organization.

I was puzzled by this situation; how could we break this status quo and move towards a data-driven organization as required by the mission statement?

This meant I had to start from scratch. Failing the first part of the mission did teach me a lot on what the client was actually looking for and how people reflected. I put all elements together and reflected together with my internal colleague on how we could move forward.

My internal colleague had a more strategic insight into the organization. He only recently took up the position and discovered from his perspective things weren’t quite as efficient as he expected them to be.

Given the above, we decided to switch gear and to take a different approach. My colleague dived into the organization’s strategic documentation and I looked into how teams could be optimized in order to guarantee better data governance. This lead to a presentation in which we could paint different steps on the short, medium and long term in order to achieve the data centricity management was keen on.

The very first step had to be the evangelization of the need governance of enterprise IT, including data governance, to the higher and the middle management. Although a vision was defined in the mission statement, key stakeholders had been replaced and the newly assigned C-level wasn’t necessarily aligned with the vision that was produced by there predecessors; the level below was yet to be convinced of its use.

Building on this first evangelization, we argued that an introduction of a more formalized demand management within the business organization or on the business side of IT would be a great asset, as development was often externalized and little or no quality control on the actual delivery was put in place.

Next, the reasoning continued in the sense that with this demand management organization we could put in place the ideas of a Data Mesh [1], in which data products can be developed independently (thus respecting the existing silos in the organization), but also must reply to the policies that are put in place by the organization (that would come from the newly established demand organization).

Finally, we introduced the ideas of ‘Service Level Agreements’ [2] (identifying the needs of the business), ‘Service Level Objectives’ [3] (identifying the criteria deliveries should reply to) and ‘Service Level Indicators’ [4] (identifying the actual metrics that proof a Service Level Objective has been met). Within these SLA/SLO/SLI that go wider than data management only, we could easily fit in Business Rules (which would go into SLA/SLO) and data quality rules (which are actually real SLO/SLI).

The above widened the scope of our project beyond data management, which made it more acceptable for the strongly process-oriented organization. This is where we are right now.

In the near future, we plan to continue this track and introduce data observability into the operational runtime, especially at the level of cross-functional APIs. This includes the measurement of the following aspects [5].

Freshness: how up-to-date is your data?
Distribution: does your data fall within an accepted range?
Volume: is your data complete?
Schema: has the structure of your data changed?
Lineage: what are the upstream and downstream impacts of data downtime?

Doing so, we would establish another aspect of the Data Mesh theory, namely the creation of what is called “Data Reliability Engineering” [6] (DRE), which would form a great base for better data governance. DRE would be able to identify the SLIs that would specify certain objectives that are defined by the Data Custodian.

In an organization in which IT is mostly done by third parties this would provide a way to regain control of whatever is being delivered from within the organization. For my client this would mean that they would have a great tool to transform their siloed organization into an innovative data driven organization without changing much of their current way of working.

Talking about a new oil or a swamp that didn’t relate to what they were doing didn’t help them; pointing out the weaknesses in the organization with regard to quality as a whole and how data governance could play a major role did. It show cased also what really meant to think big and start small: the first baby steps you need to take may even not relate directly to data management itself, but should lead to an improved context in which data management and data governance likewise can be accepted as one of the capabilities to assure the quality of the solutions that are being built. This is definitely something I’ll take with me for the rest of my career.

References:

[1] The theory on data mesh is founded by Zhamak Dehghani and published on Data Mesh Principles and Logical Architecture (martinfowler.com)

[1] See “Service Level Agreement (SLA) Examples and Templates” by Muhammad Raza (Service Level Agreement (SLA) Examples and Template – BMC Software | Blogs)

[1] See “Service Level Objectives (SLOs) Explained” by Muhammad Raza (Service Level Objectives (SLOs) Explained – BMC Software | Blogs)

[1] See “A Primer on Service Level Indicator (SLI) Metrics” by Stephen Watts (A Primer on Service Level Indicator (SLI) Metrics – BMC Software | Blogs)

[1] See “The Big Book of Data Observability” by Monte Carlo (https://resources.montecarlodata.com/resources/the-big-book-of-data-observability-ebook)

[1] See “How Data Reliability can Solve Today’s Data Challenges” on DataVersity (How Data Reliability Engineering Can Solve Today’s Data Challenges - DATAVERSITY)

Share0

Tweet0

About the author

Ludovic Janssens

Ludovic Janssens has a history as a database administrator on Db2 for z/OS. He was a regular speaker at IDUG conferences and was part of its content committee, publishing and reviewing technical articles. He also participated in the definition of IBM certification exams. Later, he learned about big data, cloud implementations and took an interest in Data Governance and the Governance of Enterprise IT. Now he is mainly working on Data Architecture, Data Governance and the Change Management that is linked to it.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
tve_leads_unique	1 month	This cookie is set by the provider Thrive Themes. This cookie is used to know which optin form the visitor has filled out when subscribing a newsletter.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_1Z635JPV9L	2 years	This cookie is installed by Google Analytics.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AE_AB_COOKIE	1 year	No description
DEVICE_INFO	5 months 27 days	No description
loglevel	never	No description available.
tl_4829_4830_26	1 month	No description
tl_4829_4840_30	1 month	No description
tl_4829_4941_41	1 month	No description
tve_secret	1 year	No description available.

Think big and start small for successful data governance

How to get started with Data Governance when the odds are against you?

Ludovic Janssens

Human in the Loop AI: Why It’s Often Just a Checkbox

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling

Optimizing retail operations through a practical data strategy

You may also like:

Human in the Loop AI: Why It’s Often Just a Checkbox

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance