The history of big data

Do a quick google search and you’ll quickly realize that no one can really agree on the true origins of the term ‘Big Data’. Some argue that it has been around since the early 1990s, crediting American computer scientist John R Mashey, considered the ‘father of big data’, for making it popular.

Others believe it was a term coined in 2005 by Roger Mougalas and the O’Reilly Media group. And some would even argue that the idea of ‘big data’ didn’t really take off until the 2010s. But wherever you stand in the origins of the term, one thing that we can all agree on is that Big Data has actually been around for many, many years. Big Data is not something that is completely new or only of the last two decades. Arguably though, in the last decade it did turn into a bit of a buzz word.

Over the course of centuries, people have been trying to use data analysis and analytics techniques to support their decision-making process.

The ancient history of Big Data

The earliest examples we have of humans storing and analyzing data are the tally sticks, which date back to 18,000 BCE! The Ishango Bone was discovered in 1960 in what is now known as Uganda and is thought to be one of the earliest pieces of evidence of prehistoric data storage.

De Heinzelin’s detailed drawing of the Ishango bone

Paleolithic tribespeople would mark notches into sticks or bones, to keep track of trading activity or supplies. They would compare sticks and notches to carry out rudimentary calculations, enabling them to make predictions such as how long their food supplies would last.

Then, in 2400 BCE came, the abacus. The first dedicated device constructed specifically for performing calculations. The first libraries also appeared around this time, representing our first attempts at mass data storage.

The ancient Egyptians around 300 BC already tried to capture all existing ‘data’ in the library of Alexandria. Moreover, the Roman Empire used to carefully analyze statistics of their military to determine the optimal distribution for their armies.

But, in more recent times it has revolutionized the modern business environment.

Big Data in 20th century

The first major data project was created in 1937 and was ordered by the Franklin D. Roosevelt administration after the Social Security Act became law. The government had to keep track of contributions from 26 million Americans and more than 3 million employers. IBM got the contract to develop punch card-reading machine for this massive bookkeeping project.

The first data-processing machine appeared in 1943 and was developed by the British to decipher Nazi codes during World War II. This device, named Colossus, searched for patterns in intercepted messages at a rate of 5,000 characters per second, reducing the length of time the task took from weeks to merely hours.

A Colossus Mark 2 codebreaking computer being operated by Dorothy Du Boisson (left) and Elsie Booker (right), 1943 | Source: Wikipedia

Then, in 1965, the United States Government decided to build the first ever data centre to store over 742million tax returns and 175 million sets of fingerprints. They decided to do this by transferring those records onto magnetic computer tape that had to be stored in a single location. The project was later dropped but is generally accepted as the beginning of the electronic data storage era.

The internet age and the dawn of Big Data

Between 1989 and 1990 Tim Berners-Lee and Robert Cailliau created the World Wide Web and developed HTML, URLs and HTTP, all while working for CERN. The internet age with widespread and easy access to data had begun and by 1996 digital data storage had become more cost-effective than storing information on paper.

Tim Berners-Lee and Robert Cailliau

The domain google.com was registered a year later in 1997 and would launch the following year in 1998 firing the starting pistol on the search engine's climb to data dominance and the development of numerous other technological innovations, including in the areas of machine learning, big data and analytics.

In 1998, Carlo Strozzi developed NoSQL, an open-source relational database that provided a way to store and retrieve data modelled differently from the traditional tabular methods found in relational databases. Then, in 1999, the first edition of How Much Information by Hal R. Varian and Peter Lyman attempted to quantify the amount of digital information available in the world at that point.

The information age

Since the early 2000s, the Internet and the Web has offered unique data collections and data analysis opportunities. With the expansion of web traffic and online stores, companies such as Yahoo, Amazon and eBay started to analyze customer behavior by looking at click-rates, IP-specific location data and search logs. This opened a whole new world of possibilities.

In 2005, Big Data was labelled by Roger Mougalas as he referred to a large set of data that, at the time, was almost impossible to manage and process using the traditional business intelligence tools available. In the same year, Hadoop, which could handle Big Data, was created. Hadoop was based on an open-sourced software framework called Nutch and was merged with Google’s MapReduce.

Big Data revolutionized entire industries and changed human culture and behavior. It is a result of the information age and is changing how people exercise, create music, and work.

For example, Big Data is being used in healthcare to map disease outbreaks and test alternative treatments. NASA uses Big Data to explore the universe. The music industry replaces intuition with Big Data studies. Utilities use Big Data to study customer behavior and avoid blackouts. Nike uses health monitoring wearables to track customers and provide feedback on their health and Big Data is being used by cybersecurity to stop crime.

The future of Big Data

Since Big Data first entered the scene, its definition, its use cases, technology and strategy of harnessing its value evolved significantly across different industries. Innovations in cloud computing, quantum computing, Internet of Things (IoT), artificial intelligence, and so on will allow for Big Data to evolve further as we'll find new ways of harnessing its potential.

Excellent

Share0

Tweet0

About the author

George Firican

George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
tve_leads_unique	1 month	This cookie is set by the provider Thrive Themes. This cookie is used to know which optin form the visitor has filled out when subscribing a newsletter.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_1Z635JPV9L	2 years	This cookie is installed by Google Analytics.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AE_AB_COOKIE	1 year	No description
DEVICE_INFO	5 months 27 days	No description
loglevel	never	No description available.
tl_4829_4830_26	1 month	No description
tl_4829_4840_30	1 month	No description
tl_4829_4941_41	1 month	No description
tve_secret	1 year	No description available.

The history of big data

The ancient history of Big Data

Big Data in 20th century

The internet age and the dawn of Big Data

The information age

The future of Big Data

George Firican

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling

Optimizing retail operations through a practical data strategy

Transforming Marketing Data into Business Growth: Key Insights and Strategies

You may also like:

The 6 layers of AI governance: A practical AI governance framework

How AI Is Reinventing MDM and Data Governance

From fragmented data to planetary-scale systems: why FSA/MEBS represents a step-change in enterprise modeling