history of big data

Do a quick google search and you’ll quickly realize that no one can really agree on the true origins of the term ‘Big Data’. Some argue that it has been around since the early 1990s, crediting American computer scientist John R Mashey, considered the ‘father of big data’, for making it popular.

Others believe it was a term coined in 2005 by Roger Mougalas and the O’Reilly Media group. And some would even argue that the idea of ‘big data’ didn’t really take off until the 2010s. But wherever you stand in the origins of the term, one thing that we can all agree on is that Big Data has actually been around for many, many years. Big Data is not something that is completely new or only of the last two decades. Arguably though, in the last decade it did turn into a bit of a buzz word.

Over the course of centuries, people have been trying to use data analysis and analytics techniques to support their decision-making process.

The ancient history of Big Data

The earliest examples we have of humans storing and analyzing data are the tally sticks, which date back to 18,000 BCE! The Ishango Bone was discovered in 1960 in what is now known as Uganda and is thought to be one of the earliest pieces of evidence of prehistoric data storage.

De Heinzelin’s detailed drawing of the Ishango bone

Paleolithic tribespeople would mark notches into sticks or bones, to keep track of trading activity or supplies. They would compare sticks and notches to carry out rudimentary calculations, enabling them to make predictions such as how long their food supplies would last.

Then, in 2400 BCE came, the abacus. The first dedicated device constructed specifically for performing calculations. The first libraries also appeared around this time, representing our first attempts at mass data storage.

The ancient Egyptians around 300 BC already tried to capture all existing ‘data’ in the library of Alexandria. Moreover, the Roman Empire used to carefully analyze statistics of their military to determine the optimal distribution for their armies.

But, in more recent times it has revolutionized the modern business environment.

Big Data in 20th century

The first major data project was created in 1937 and was ordered by the Franklin D. Roosevelt administration after the Social Security Act became law. The government had to keep track of contributions from 26 million Americans and more than 3 million employers. IBM got the contract to develop punch card-reading machine for this massive bookkeeping project.

The first data-processing machine appeared in 1943 and was developed by the British to decipher Nazi codes during World War II. This device, named Colossus, searched for patterns in intercepted messages at a rate of 5,000 characters per second, reducing the length of time the task took from weeks to merely hours.

A Colossus Mark 2 codebreaking computer being operated by Dorothy Du Boisson (left) and Elsie Booker (right), 1943 | Source: Wikipedia

Then, in 1965, the United States Government decided to build the first ever data centre to store over 742million tax returns and 175 million sets of fingerprints. They decided to do this by transferring those records onto magnetic computer tape that had to be stored in a single location. The project was later dropped but is generally accepted as the beginning of the electronic data storage era.

The internet age and the dawn of Big Data

Between 1989 and 1990 Tim Berners-Lee and Robert Cailliau created the World Wide Web and developed HTML, URLs and HTTP, all while working for CERN. The internet age with widespread and easy access to data had begun and by 1996 digital data storage had become more cost-effective than storing information on paper.

Tim Berners-Lee and Robert Cailliau

The domain google.com was registered a year later in 1997 and would launch the following year in 1998 firing the starting pistol on the search engine's climb to data dominance and the development of numerous other technological innovations, including in the areas of machine learning, big data and analytics.

In 1998, Carlo Strozzi developed NoSQL, an open-source relational database that provided a way to store and retrieve data modelled differently from the traditional tabular methods found in relational databases. Then, in 1999, the first edition of How Much Information by Hal R. Varian and Peter Lyman attempted to quantify the amount of digital information available in the world at that point.

The information age

Since the early 2000s, the Internet and the Web has offered unique data collections and data analysis opportunities. With the expansion of web traffic and online stores, companies such as Yahoo, Amazon and eBay started to analyze customer behavior by looking at click-rates, IP-specific location data and search logs. This opened a whole new world of possibilities.

In 2005, Big Data was labelled by Roger Mougalas as he referred to a large set of data that, at the time, was almost impossible to manage and process using the traditional business intelligence tools available. In the same year, Hadoop, which could handle Big Data, was created. Hadoop was based on an open-sourced software framework called Nutch and was merged with Google’s MapReduce.

Big Data revolutionized entire industries and changed human culture and behavior. It is a result of the information age and is changing how people exercise, create music, and work.

For example, Big Data is being used in healthcare to map disease outbreaks and test alternative treatments. NASA uses Big Data to explore the universe. The music industry replaces intuition with Big Data studies. Utilities use Big Data to study customer behavior and avoid blackouts. Nike uses health monitoring wearables to track customers and provide feedback on their health and Big Data is being used by cybersecurity to stop crime.

The future of Big Data

Since Big Data first entered the scene, its definition, its use cases, technology and strategy of harnessing its value evolved significantly across different industries. Innovations in cloud computing, quantum computing, Internet of Things (IoT), artificial intelligence, and so on will allow for Big Data to evolve further as we'll find new ways of harnessing its potential.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    About the author 

    George Firican

    George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

    You may also like:

    How to Become a Data Science Freelancer

    George Firican


    Data Governance in 2024

    Data Governance in 2024
    5 Steps to Achieve Proactive Data Observability – Explained Over Beers