Apache Spark

Apache Spark delivers in-memory processing for big data and enables faster application development

Apache Spark is a general-purpose engine for large-scale data processing. It supports rapid application development for big data and allows for code reuse across batch, interactive, and streaming applications. The most popular use cases for Apache Spark include building data pipelines and developing machine learning models. The MapR Converged Data Platform is the choice for production Spark applications.

New to Apache Spark? Get the ebook. Getting Started with Apache Spark: From Inception to Production

Key Features

Analytics on Consistent Data:

The MapR Converged Data Platform enables data scientists to perform analytics on consistent data in both development and production environments through features such as mirroring and consistent snapshots.

Secure Multi-Tenant Applications:

The MapR Converged Data Platform enables development of reliable and secure multi-tenant applications leveraging Apache Spark.

Run Streaming & NoSQL Workloads Together:

The MapR Converged Data Platform enables the development of streaming and NoSQL applications on a single cluster. By using Spark Streaming, MapR Streams, and MapR-DB together, real-time operational applications can be developed that allow for data ingestion at high speeds.

Use Cases

Faster Batch Applications:

You can now develop and deploy batch applications that run 10-100x faster in production environments with in-memory processing of data. Quantium uses Spark on the MapR Platform to decrease processing time by 92%, which represents a 12.5X increase in performance.

Case study

Complex ETL Data Pipelines:

You can leverage the Spark stack to build complex ETL pipelines that can speed up data ingestion and deliver superior performance. Razorsight leverages Spark on the MapR Platform to build a more efficient and cost-effective data pipeline which enables them to deliver cloud-based predictive analytics faster to their mobile and telco operators.

Case study

Advanced Analytics:

You can leverage MLlib and GraphX to develop applications that combine the power of machine learning with graph technology. This can enable faster application development and enable data scientists to test new hypothesis faster. Novartis uses Spark on the MapR Platform to integrate and analyze a variety of data to accelerate drug research.

Case study