May 16, 2014 | BY Dr. Kirk Borne
What is big data? There are various definitions, nearly all of which focus appropriately on the concept of “big data”, and not on the data itself, whose volume is undeniably quite BIG and thus not particularly informative as a defining characteristic! Most definitions of the big data concept, therefore, revolve around either: (a) the 3 V’s that characterize it (Volume, Velocity, and Variety); or (b) the staunch belief that big data simply refers to data that’s not the same as the data we previously collected. I have a better definition, which defines what big data really means to the world today. I will explain what that is, after examining the two choices above.
Big Data is Big
Using definition (a) above, which simply lists characteristics of big data (in a very restrictive manner by the way), we have violated the first rule of definitions that we all learned in grade school: defining “how something is different” is not the same as defining “what something is.” Example: What is a guépard? Answer: A guépard is the world’s fastest land mammal. But… what is it? Note that I have also contributed to the “3 V’s” mnemonic characterization of big data by introducing my own Top 10 list of the 10 V’s that characterize the main big data challenges – but, again, these are characteristics, not a definition.
Big Data is Unlike Previous Data
Using definition (b) from the opening paragraph, which is also restrictive, we end up again with another relative description (in this case, a negative comparison) – this is not an actual description or definition. Example: What is a wolverine? A wolverine is not a wolf. So… what is it?
A common extension to definition (b) states that big data refers to data that’s so big, so complex, and moving at such a high rate that it exceeds our existing resources for data acquisition, storage, processing, analysis, and interpretation. This is good, but again it is a comparative definition (relative to something else), not an actual definition. In fact, using this definition, one could easily argue that even the ancient Romans had big data! As a consequence of this mindset, there are many folks, especially in their online resumes, who conveniently claim to have done big data for decades! But I say: “Today’s Big Data is Not Yesterday’s Big Data!”
Big Data is Your Ticket to Data-Driven Decisions and Discovery
My current, best definition of big data, and the one that I prefer (not entirely because I created it, but mostly because I truly believe in it) is this: big data is everything, quantified and tracked. Let’s pick that apart:
- Everything – this means that every aspect of life, work, consumerism, entertainment, and play is now recognized as a source of digital information (data) about you, your world, and anything else we may encounter.
- Quantified – this means that we are storing those “everythings” somewhere, mostly in digital form, often as numbers, but not always in such formats. Nevertheless, data analytics pros and data scientists are quantifying even traditional non-numeric data sources (through pattern recognition and feature characterization in image/video streams, sonification in audio streams, text analytics and sentiment analysis in social media and other text streams, etc.). The quantification of features, characteristics, patterns, and trends in all things is enabling data mining, machine learning, statistics, and discovery at an unprecedented scale on an unprecedented number of things. The Internet of Things is just one example (albeit a very big one), but the Internet of Everything is even more awesome.
- Tracked – this means that we don’t simply quantify and measure everything just once, but we do so continuously (or at least, repeatedly). This includes: tracking your sentiment, your web clicks, your purchase logs, your geo-location, your social media history, etc., or tracking the motion of every vessel on the sea, or asteroids in space, or trillions of particle-particle collisions in the Large Hadron Collider in order to find the Higgs boson, or all cases of invasive species in non-indigenous environments, etc., or tracking every car on the road, or every motor in a manufacturing plant, or every moving part on an airplane, etc. Consequently, we are seeing the emergence of smart cities, smart highways, personalized medicine, personalized education, precision farming, and so much more.
All of these quantified and tracked data streams will enable smarter decisions, better products, deeper insights, greater knowledge, optimal solutions, customer-centric products, increased customer loyalty, more automated processes, more accurate predictive and prescriptive analytics, and better models of future behaviors and outcomes in business, government, security, science, healthcare, education, and more.
So, don’t be left out of the big data revolution because the terminology seems vague or daunting. Focus on your business goals, what you are trying to achieve, and big data’s three D2D’s (Data-to-Decisions, Data-to-Discovery, and Data-to-Dollars). You will then arrive at big data’s biggest meaning: big value and big ROI = Return on Innovation!
In conclusion, I was very impressed recently with the record growth of MapR, and how that success was connected with quantifying and tracking the number of snacks that are consumed by passengers on airline flights. LOL! Score one for the new definition of big data: everything, quantified and tracked!