Major League Baseball implemented “Statcast” in 2014 to provide each team with seven terabytes of data recorded by radars and camera. To put a terabyte into perspective, an average Excel spreadsheet can manage four gigabytes of data. That means at least 1,750 Excel workbooks would be needed per game to house all of the recorded data. Multiply that by the ungodly length of a season—162 games—and you arrive at more than 280,000 Excel workbooks for each team.
The scientific name for this problem is “too much data,” and without a solution, the plot of Moneyball 2 would be Brad Pitt sitting in a dimly lit room yelling four letter words at his laptop for the entire 90-minute movie. Asking the right questions with powerful algorithms transforms too much data into big data. Baseball teams, like many industries, have adopted this approach in the never-ending pursuit of more accurate decision making.
Do not let the consultant-ese gobbledygook that surrounds this concept confuse you. Big data is just data. Anyone who has broken an Excel sheet dabbles in big data, because big data describes data sets that are so massive that our current forms of processing (e.g. Excel) are incapable of making sense of them. To properly define the term further, we should assess how it is produced, how it should be used, how it should not be used, and its impact on our lives.
We are all data producers. Our smartphones track our location and provide app makers with second-by-second information. Online marketplaces record all of our clicks, and even the clicks we do not make. Surprisingly, the largest source of data in the world makes up who we are. Well-known scientist and data expert Riccardo Sabatini refers to pregnant women as the first “3D printers … assembling the biggest amount of information that you will ever encounter.” Each person’s genome fills 262,000 pages of text.
Now that’s big data.
Actually doing something with Big Data is a completely different challenge. IBM’s Watson uses machine learning/AI to try to make sense of all of this information. Other data processing technologies are sprouting up, claiming they can improve business performance by leaps and bounds. Nevertheless, the critical ingredient in the big data stew is the human touch.
Google’s Director of Research, Peter Norvig, famously said, “We do not have better algorithms. We just have more data.” A data-oriented company is no longer run by the highest-paid person’s opinions, but rather that person's questions. Without the right questions and analysis, big data is a pile of useless garbage at best, and at worst, it's harmful. Just as statistics can be manipulated to support conflicting viewpoints, big data can result in spurious correlations. For companies assessing the data that's valuable to them, it's important to set standards for what that data represents. This eliminates confusion, useless analytics, and false inferences.
Another major problem is the revenge of the nerds. Teams of data scientists are attempting to solve questions that have non-scientific answers. For example, scientific precision cannot be used to make judgment calls, like ranking the most important or best something of all time, and it struggles when assessing cultural decisions like hiring and building teams. Big data needs big judgment to work.
The real life applications of big data are intriguing and slightly Orwellian. Macy’s can project their Black Friday revenue based on how many mobile phones are in their parking lot. Amazon has patented “anticipatory shipping” which ships an item before a member knows they want it, based on an algorithm. Predictive policing uses analytics to send law enforcement to locations before crime happens.
In the business world, big data enables companies to better assess risk and develop products or services based on consumer preference. Retail brands analyze and then predict customer preferences. Manufacturing companies read sensors on machinery and apply production schedules to anticipate equipment maintenance and replacement. Franchises determine locations for storefronts based on data concerning demographics, traffic analysis, and consumer behaviors. Exploration companies, before they drill, gather and assess millions of records regarding both the presence of oil and gas and its extractability.
However, the sheer amount of data proves daunting for many corporations. What data is actually helpful to the bottom line? Once that’s determined, how can it be used?
With the improvement in data storage capabilities and processing technology, the term big data may soon disappear, and information will become "data" once again. Data science has and always will require the right question and human analysis to be useful.
This article has been adapted from a chapter from Trenegy’s book, Jar(gone).
Trenegy is a non-traditional consulting firm dedicated to helping companies clarify the latest business jargon, putting it into useful terms and solutions that benefit your company. Find out more: info@trenegy.com.