BI, Analytics, and Big Data: The "A-ha!" Moment (Part 1)

Contributed by

7 min read

This blog post is the first in a series based on the ebook The Definitive Guide to BI and Analytics on a Data Lake by Sameer Nori and Jim Scott. Their book examines the promise, potential, and significant dynamics of analytics and BI on the big data platform.

The Definitive Guide to BI & Analytics on a Data Lake

In this first chapter, the authors discuss why traditional systems cannot scale to cost-effectively handle growing volumes of data. The authors go on to describe how BI and analytics have evolved over the last three decades from being IT-driven to analyst-driven with self-service tools. Here’s an excerpt:

Without tools, data is nothing but an expense

Without the tools to organize, analyze, and properly mine that data, it remains just that—an ever-growing pile of data, doubling in volume every two years, with little or no business value. Most infrastructures and solutions in place today cannot efficiently scale to process and analyze this data.

Plus, most of this new data is unstructured or semi-structured (from here on only referred to as unstructured), hailing from sources like email, social media, video, and others. Traditional systems tend to choke when fed large volumes of data that is not completely structured.

For these and other reasons, all eyes today are upon solutions built from the ground up to gain business value from big data. These BI and analytics solutions for big data have one strong point of commonality, namely the Hadoop framework, and in particular Hadoop distributions that include Apache Spark. Forrester calls Hadoop ‘the new core of the analytical enterprise,” adding that Hadoop “is mandatory for firms that wish to double-down on advanced analytics and create insights-driven applications to help them succeed in the age of the customer.”

Old solutions won’t work for the new normal

Traditional infrastructures offering real-time analysis of normalized data in warehouses or data marts worked swimmingly for 25 years. Big data is anything but normal, both in its hockey-stick growth and in its basic format, namely unstructured.

Traditional systems cannot scale to handle the volumes in anything approaching a cost-effective way. Nor can they efficiently—if at all—process and provide a platform for analysis of data from social media, videos, emails, and other emergent sources.

Data is evolving. So it is not surprising that the worlds of BI and analytics are also in a state of metamorphosis.

BI and analytics at-a-glance

We can break down the evolution of BI and analytics into three easy pieces, or chapters.

  • 1980s and 1990s – The era of IT-driven analytics. The CFO wants a report? Go to IT and ask for it. The CEO needs a report? Ditto. The reports and spreadsheets created back then remain a mainstay of BI today. But as senior managers gained a deeper appreciation for the value of data, so did their demand for reports. The notorious result was called report backlog.
  • 2000s – Several self-service BI and analytics tools emerged and the backlogs diminished. Still, these tools often required a measure of technical expertise. But surely they whet appetites for more and better self-service tools as productivity within the business analyst community soared.
  • The present – Big data is synonymous with a growing variety of data formats, like JSON and complex flat schemas, often stored in a data lake. In an era of schema-free data analysis and exploration, IT support can be a thing of the past. And this translates into near-instant data analysis by non-IT stakeholders. Take marketing VPs, for example. Their ability to improve customer lifetime value and conversion rates depends on immediate access to data flowing in from campaigns and from external sources. The faster they get it and analyze it, the quicker they can adjust on-the-fly to shifting market conditions.

Data from various recent surveys confirms the growing enthusiasm for big data analytics. One such survey from Wikibon polled 300 organizations that either deployed or were evaluating big data analytics projects. One key finding: A majority believe big data analytics represents an entirely new source of competitive advantage, not just a complement to data warehouses and existing BI workloads.

Other significant findings include:

  • Companies are moving steadily from pilots to actual deployments of big data analytics.
  • As organizations gain more IT expertise in evaluating and deploying these solutions, increasing numbers of them report actual “success” in the deployments.
  • Initial primary use cases for big data analytics include IT operations support emphasizing cost savings, which help defray deployment costs, and ETL (extraction, transformation, and loading) of data from hetero- and homogeneous sources.
  • As users grew the number of Hadoop clusters deployed from 1 to 2 or more, the number of administrators assigned to each cluster dropped dramatically from 3.5 for one cluster to less than 1.5 when 3 or more clusters are deployed. This dynamic reflects how rapidly IT staff acquires new skills for this environment.
  • Challenges early on include ensuring that basic integration and operational performance work smoothly together. There are also challenges to maintaining application performance at very high data volumes.

Next Time

In the next blog post on the subject, we’ll take a deeper look into the SQL landscape, including Batch SQL, Interactive SQL, In-Memory SQL, and Operational SQL. These are covered in Chapter 2 of the The Definitive Guide to BI and Analytics on a Data Lake ebook.

Compliments of MapR.

References and More Information:

This blog post was published November 09, 2016.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now