MapR Data Science Refinery

With the MapR Data Science Refinery, MapR provides businesses with a suite of data science tools to help them distill insights from their data and turn those insights into operational next-gen applications.

MapR has recognized the need for agile, containerized solutions that can scale to fit the needs of all types of data science teams. Within the MapR platform, support is offered for popular open source tooling in a preconfigured offering that can be distributed to many data science teams across a multitenant environment.

MapR Data Science Refinery graphic

The MapR Data Science Refinery is an easy-to-deploy and scalable data science toolkit with native access to all platform assets and superior out-of-the-box security.

The MapR Data Science Refinery offers:

  • Access to All Platform Assets - The MapR FUSE-based POSIX Client allows app servers, web servers, and other client nodes and apps to read and write data directly and securely to a MapR cluster, like a Linux filesystem. In addition, connectors are provided for interacting with both MapR Database and MapR Event Store via Apache Spark connectors.

  • Superior Security - The MapR Platform is secure by default, and Apache Zeppelin on MapR leverages and integrates with this security layer using the built-in capabilities provided by the MapR Persistent Application Container (PACC).

  • Extensibility - Apache Zeppelin is paired with the Helium framework to offer pluggable visualization capabilities.

  • Simplified Deployment - A preconfigured Docker container provides the ability to leverage MapR as a persistent data store.

Why MapR Data Science Refinery?

Enable More Accurate Insights with Access to All Data

The MapR Data Science Refinery is the only data science offering with secured access to all data. It connects out of the box with:

MapR XD: for files and containers

  • Globally distributed data store
  • High-scale and reliable

MapR Database: a highly scalable, multi-model, NoSQL database management system

  • Supports multiple data models, including wide-column, document, key value, and time-series

MapR Event Store for Apache Kafka: global publish-subscribe event streaming system

  • The first big data-scale streaming system built into a converged data platform
  • The only big data streaming system to support global event replication reliably at IoT scale

Create Real-Time Machine Learning Pipelines

A core component of the MapR Platform, MapR Event Store is a global publish-subscribe event streaming system for big data. With native integration between MapR Event Store and machine learning libraries, organizations can now create real-time machine learning pipelines, allowing them to apply ML models to real-time data.

Increase Data Science Productivity with Broad Language and Library Support

The MapR Data Science Refinery offers the Apache Zeppelin Data Science Notebook to provide the ability to work across many engines in one visual space:

  • Distributed Compute and ML programming with Apache Spark & Python
  • Batch and Interactive SQL with Apache Hive and Drill
  • Scripting support for Apache Pig
  • Shell access to MapR-FS
  • Programmatic access to MapR Database and MapR Event Store, using Spark

Easy Deployment with Persistent and Stateful Containers

Easy To Deploy

  • A Docker image is available on Docker Hub.
  • Image includes all the necessary bits—no more, no less—required to leverage MapR as a persistent data store for your containerized applications.


  • Authentication occurs at a container level to ensure containerized applications only have access to data for which they are authorized.
  • Communications are encrypted to ensure privacy when accessing data in MapR.


  • It's easy to install Deep Learning libraries to the container or to add further tools to support your specific application needs.


  • Container can easily leverage all the MapR Platform services (MapR XD, MapR Database, MapR Event Store) as a persistent data store.

Provide Robust Visualization Support to Data Scientists

The MapR Data Science Refinery comes with 8 out-of-the-box visualization libraries, including MatPlotLib and GGPlot2. Apache Zeppelin provides a pluggable visualization framework to enable:

  • Common visualization libraries available in the NPM Registry
  • The ability to easily create and load custom visualizations

Enable Notebook/Model Collaboration, Sharing, and Mirroring

The MapR Converged Data Platform is ideal for storing model and notebook repositories. Organizations can leverage the MapR Platform’s global namespace and superior replication capability. The MapR Platform also offers immutable snapshots to persist and deploy various versions of the same model, making it possible for data scientists to compare the performance and accuracy of each version of the model.

How Your Business Benefits from the MapR Data Science Refinery

6 blocks

Higher Accuracy for Business Predictions

Machine learning models are only as good as the data they are trained on. With the MapR Data Science Refinery, data scientists get access to all data, which improves the accuracy of the models.

Instant Insights

Using MapR Event Store, the MapR Data Science Refinery allows data scientists to create real-time machine learning pipelines. Organizations can now apply machine learning models to real-time data to gain instant business insights.

Higher Data Scientist Productivity

MapR Data Science Refinery provides access to a broad range of popular data science tools and libraries, making it easy for data scientists to select the tool of their choice. As a result, data scientists are more productive.

Lower TCO

The MapR Data Science Refinery is easy to deploy and manage. It also provides access to data in-place, removing the need for additional hardware for copying data. As a result, the MapR Data Science Refinery has a lower TCO compared to other data science offerings.

Visualize Your Business

The MapR Data Science Refinery provides pluggable and broad visualization support, helping business leaders and decision makers to visualize the business as it happens.

Intelligent Processes

The MapR Data Science Refinery helps organizations incorporate machine learning and AI into day-to-day business workflows, enabling intelligent processes that can operate without human intervention.

Get Started With The MapR Data Science Refinery

Refinery Partners

ML is an active area of research and market innovation. There are game-changing ML companies, investing to improve data science productivity as well as build domain-specific machine learning solutions. As a data platform company, we want to be open and give our customers flexibility to use these solutions on the PBs of business data they are relying on MapR to store and manage. MapR has a robust Converged Partner program, and we’re extending this program with selected Refinery partnerships as a holistic approach to enabling the MapR Platform for all types of data science teams.

Learn More

The MapR Community’s Data Science Refinery Page is the place to go for:

  • Partner Solutions
  • Demos
  • Blogs
  • Books
  • Webinars