We've heard of big data or small data, but what is this concept of dark data?

Being a Star Wars fan, my mind went straight into that lore to make an association between dark data and the dark side of the Force. Geek alert, right? Well, Star Wars is fictional, but there's nothing fictional about dark data. So what is dark data?

What is dark data?

There are actually a couple of views on what dark data is. Let's go over the first:

Dark data definition

Dark data is data which an organization acquires through various processes and stores during regular business activities, but is not used for, in any manner, to derive insights or decisions or monetization.

Examples of dark data

I don't know about you, but I understand things a lot better when I'm being given examples. The way I see dark data is like all of the photos on your phone. Most of them will never be used or even viewed again, but they are there. According to Gigaom, the average person has 630 photos stored on their phone. And this was in 2015 so you can bet that number has increased considerably. Back in 2017, InfoTrends estimated that there were over 1.2 trillion photos being taken every day.

What about some examples from the business side? Well, a good example of dark data could be data generated by sensors. IBM estimates that roughly 90% of data produced by sensors and analog-to-digital conversions never get used.

Let's recall that dark data represents all the information companies collect in their regular business processes, don’t use, have no plans to use, but will never throw out. This includes things like:

  • Web logs
  • Visitor tracking data
  • Surveillance footage
  • Email correspondences from past employees
  • Old versions of documents
  • Raw survey data 
  • Notes or presentations
  • Maybe even transactional data

Dark data explained in a video

Dark data: an alternative definition

This was the main definition of dark data, but as I mentioned, there re a couple of views of what dark data is. 

there is another

In this second definition, dark data refers to:

Any data that is collected for a specific purpose, but not used for other suitable purposes as well.

Let's take a healthcare example. There's a lot of data being produced and collected from our smart devices like cell phones and tablets, thermostats and humidifiers, and virtual assistants like Google and Alexa. 

The data collected by these devices and services are not collected for healthcare purposes. Therefore, from a healthcare's point of view, this is dark data.

In that sense, we as individuals create a lot of dark data. Every time we:

  • Make an online purchase
  • Use our GPS
  • Use the check-in function on Facebook
  • Track our calories in a phone app
  • Monitor our physical activities with our smart phone
  • Have our smartwatch record our biometrics

To a hospital or a healthcare professional, for example, all this data that you're generating is dark data because it's not collected, nor used for the purposes of healthcare. Nevertheless, it could be used for healthcare purposes and here lies the value of dark data.

As you can imagine, in our healthcare example, medical staff could benefit greatly from having access to your dark data as it would provide them with a more holistic view of your lifestyle. This could result in a better treatment, one that would match your lifestyle, and it could also result in more applicable prevention medicine.

Research from Western Digital and Accenture found that dark data similar to the one I described can save in the US 200 million work sick days and add 200 billion USD in value across the healthcare system by 2030. And these are just the benefits of the healthcare industry tapping into dark data.

I'm not a big fan of this definition because then anything and everything could be considered dark data.


Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes or record keeping. Some organizations believe that dark data could be useful to them in the future, once they have acquired better analytic and business intelligence technology to process the information. Because storage is inexpensive, storing data is easy. So why not, but storing and securing data typically incurs more expense (and sometimes greater risk) than value. More about that in a separate article.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

About the author 

George Firican

George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

You may also like:

How to Become a Data Science Freelancer

George Firican


Data Governance in 2024

Data Governance in 2024
5 Steps to Achieve Proactive Data Observability – Explained Over Beers