Adapting data governance to big data in 4 areas

It’s generally accepted that any organization working with data needs to adopt data governance. Readily available frameworks, tools, and services can be adapted to the requirements and environment of your organization, yet when it comes to big data governance, the options are a bit more complex.

Organizations must adapt to big data in four important areas:

1. Data quality

As with traditional data, establishing data quality metrics that are aligned with business objectives enables you to quickly uncover data quality issues and establish remediation plans. Accuracy, completeness, validity, consistency, and integrity will still be present with big data, but there are additional data quality characteristics to be considered:

Timeliness: Does data arrive on time? Does it meet a refreshing schedule? Does it meet the requirements for the time interval from collection to processing to analysis?
Readability: Is the content and format easy to understand? Does it need to be ready for human consumption in its initial state?
Authorization: Does using the data require certain rights or permissions and what limitations are there?
Structure: Do you have the technology to transform unstructured data into structured data?
Credibility: What is your confidence in this data?

This last point is particularly important. With any new big data set, you must step back and ask: “Given the context in which I want to use this data, what information about it do I require to have trust or confidence in this data?”

As an example, consider this external source: statistics about what people purchase at restaurants and the prices of menu items over the past five years. Who created the source data? What methodology did they follow in collecting the data? Were only certain cuisines or certain types of restaurants included? Can we identify how the information is organized and if there is any correlation at any level to information already available elsewhere? Has the information been edited or modified by anyone else? Is there any other way to check the veracity of this information?

2. Metadata management

Ideally, before starting to access big data, ensure your reference information architecture is updated to support big data concepts such as unstructured data streams. Taking a call center’s data as an example, there is useful metadata assigned to the call itself, such as the country of the caller. Different software has different ways of coding that fact, either as the full name of the country or as ISO-2 or ISO-3 codes (for a downloadable code list, please see “The single best strategy for improving your mailing addresses” article). Whatever it is, you need to ensure this new information is mapped to your organization’s established reference data.

The metadata management capabilities need to be enhanced to encompass relationships between data, people, processes, and data use. To ensure continuity, the metadata also needs to be paired and promoted with education and training programs.

adapting data governance to big data

3. Data stewardship

The complexity of big data is also reflected in its stewardship. Data roles such as data steward and data owner are not as clear with large data sets. For example, what department is responsible for clickstream data? Is it marketing — because that data tracks the engagement and reach of potential customers and marketing efforts? Is it finance — because they need to calculate the return on investment? Is it IT — because that department manages the infrastructure and may be responsible for ensuring the proper APIs and tools collect the data?

It’s not advisable to have multiple “owners” responsible for the same data, and with big data, roles may change as that data moves through your ecosystem as well as through its life cycle. Nonetheless, these roles should be well understood.

Organizations should:

Identify stakeholders as soon as possible, but be prepared to refine and iterate as you go
Establish timelines and regular checkpoints; begin to measure the area being governed with key milestones
Assign clear accountability to ensure progress is made
Ensure clear measurements are employed

4. Data retention

If you have not already done so, define how long your data is considered current and relevant, then archive everything outside that range. Consider this statistic from a 2016 Veritas Global Databerg report: 85 percent of the data an average organization stores is redundant, obsolete, or trivial. Data storage is, indeed, cheap, but in the context of big data, the storage cost is increased considerably. Organizations spend millions of dollars a year storing data they’ll never use. This is not just a failure of good business sense; it is a failure of data governance.

Note: Article originally published in TDWI Upside.

Share0

Tweet0

About the author

George Firican

George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
tve_leads_unique	1 month	This cookie is set by the provider Thrive Themes. This cookie is used to know which optin form the visitor has filled out when subscribing a newsletter.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_1Z635JPV9L	2 years	This cookie is installed by Google Analytics.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AE_AB_COOKIE	1 year	No description
DEVICE_INFO	5 months 27 days	No description
loglevel	never	No description available.
tl_4829_4830_26	1 month	No description
tl_4829_4840_30	1 month	No description
tl_4829_4941_41	1 month	No description
tve_secret	1 year	No description available.

Adapting data governance to big data in 4 areas

1. Data quality

2. Metadata management

3. Data stewardship

4. Data retention

George Firican

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling

Optimizing retail operations through a practical data strategy

Transforming Marketing Data into Business Growth: Key Insights and Strategies

You may also like:

9 questions to ask for data veracity assessment

Mining Big Data empowers doctors to improve the outcomes

Adapting data governance to big data in 4 areas