7 min read
This blog post is the first in a series based on the ebook The Definitive Guide to BI and Analytics on a Data Lake by Sameer Nori and Jim Scott. Their book examines the promise, potential, and significant dynamics of analytics and BI on the big data platform.
In this first chapter, the authors discuss why traditional systems cannot scale to cost-effectively handle growing volumes of data. The authors go on to describe how BI and analytics have evolved over the last three decades from being IT-driven to analyst-driven with self-service tools. Here’s an excerpt:
Without the tools to organize, analyze, and properly mine that data, it remains just that—an ever-growing pile of data, doubling in volume every two years, with little or no business value. Most infrastructures and solutions in place today cannot efficiently scale to process and analyze this data.
Plus, most of this new data is unstructured or semi-structured (from here on only referred to as unstructured), hailing from sources like email, social media, video, and others. Traditional systems tend to choke when fed large volumes of data that is not completely structured.
For these and other reasons, all eyes today are upon solutions built from the ground up to gain business value from big data. These BI and analytics solutions for big data have one strong point of commonality, namely the Hadoop framework, and in particular Hadoop distributions that include Apache Spark. Forrester calls Hadoop ‘the new core of the analytical enterprise,” adding that Hadoop “is mandatory for firms that wish to double-down on advanced analytics and create insights-driven applications to help them succeed in the age of the customer.”
Traditional infrastructures offering real-time analysis of normalized data in warehouses or data marts worked swimmingly for 25 years. Big data is anything but normal, both in its hockey-stick growth and in its basic format, namely unstructured.
Traditional systems cannot scale to handle the volumes in anything approaching a cost-effective way. Nor can they efficiently—if at all—process and provide a platform for analysis of data from social media, videos, emails, and other emergent sources.
Data is evolving. So it is not surprising that the worlds of BI and analytics are also in a state of metamorphosis.
We can break down the evolution of BI and analytics into three easy pieces, or chapters.
Data from various recent surveys confirms the growing enthusiasm for big data analytics. One such survey from Wikibon polled 300 organizations that either deployed or were evaluating big data analytics projects. One key finding: A majority believe big data analytics represents an entirely new source of competitive advantage, not just a complement to data warehouses and existing BI workloads.
Other significant findings include:
In the next blog post on the subject, we’ll take a deeper look into the SQL landscape, including Batch SQL, Interactive SQL, In-Memory SQL, and Operational SQL. These are covered in Chapter 2 of the The Definitive Guide to BI and Analytics on a Data Lake ebook.
Compliments of MapR.
References and More Information:
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.