Bringing the Power of Spark-based OLAP to Hadoop

Contributed by

4 min read

In the world of data warehouses and data marts, OLAP analysis has existed for many years. Concepts like drill down, drill across and roll ups have allowed business analysts and users to easily access and analyze data across a variety of dimensions such as product, customers and regions. This has enabled various analyses such as product profitability, customer profitability, and other related analysis. This has been the hallmark of OLAP analysis and continues to be a key use case in the business intelligence and analytics world.

Fast forward to the big data era where the growth in semi-structured and unstructured data has dwarfed the growth of structured data. Batch analytics has been the mainstay of the first wave of Hadoop applications and has been accompanied by the emergence of data scientists who’ve been able to tap into the potential of big data. There is however, a large population of business users and analysts who continue to use traditional analysis techniques and methodologies. These methods of analysis such as OLAP need to become more easily accessible and available on new data types and sources that are being stored in Hadoop and NoSQL systems but tie back to structured data sources such as ERP. We believe SAP HANA Vora solves this problem at scale for enterprise customers.

SAP HANA Vora is a powerful in-memory query engine that leverages Apache Spark to provide enriched interactive analytics. At its core SAP HANA Vora is an Apache Spark based product that enables OLAP analysis on large volumes of data stored in Hadoop across thousands of nodes. By simplifying ownership of big data, and vastly improving analytics from democratized data access, SAP HANA Vora makes it possible for you to perform precise, context-aware decision-making.

SAP HANA Vora solves key big data challenges by providing:

  • Data correlation for making precise contextual decisions – Enables mashup of operational business data with external unstructured data sources for more powerful analytics.
  • Simplified management of big data – Allows data to be processed locally on a Hadoop cluster, removing any data ownership and integration challenges.
  • OLAP modeling capabilities on Hadoop data – Real-time drill-down analysis is possible on large volumes of Hadoop data distributed across thousands of nodes.

So, how might this enhance your data architecture today? Let’s examine the use case of the 360-degree view of the customer. Many retailers collect Twitter and Facebook data containing comments about the company and products. However, beyond monitoring general brand sentiment, they are not using this data to impact decisions (happens more than you’d think). They believe there might be some nuggets of valuable information within the social media data they are collecting that could improve their decision making, but they don’t have the tools to get to that insight today. In this example, the retailer could use the MapR Data Platform and SAP HANA Vora to store and access semi-structured and structured data and apply traditional OLAP analysis to these datasets. By uniting business data such as sales and profitability with these social feeds, a retailer can correlate sentiment with changes in product sales and use this insight to enhance models driving inventory set points, production quotas, and even pricing. When data from disparate sources is combined, the insights gleaned can go from interesting to game changing. Together, MapR and SAP Hana Vora can help customers make the leap.

Spark-based OLAP with SAP and MapR

Together with SAP, we look forward to bringing the power of OLAP to Hadoop at an enterprise scale and thereby bring contextual analytics across all data stored in Hadoop, enterprise systems, and other distributed data sources.

This blog post was published March 15, 2016.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now