Introducing the MapR Data Science Refinery

Contributed by

5 min read

Data Science is a hot topic in boardrooms right now. Everybody wants to adopt AI/ML, hire the best and brightest data scientists, and enable them to experiment and build intelligent applications. New deep learning libraries have made it possible to analyze new types of data and even gain new insights from historical data. Massive amounts of data are being generated from the boom in IoT computing, which means there’s even more demand for ML aggregation at the edge. Everybody wants in.

But what we’re seeing is that our customers are struggling with existing solutions not scaling sufficiently to allow them to derive business value from ML. Most solutions currently available require the use of entirely new clusters with limited access to data and high IT overhead. Models are built on the small samples of data that can be accommodated and then deployed into production. Many offer closed platforms that cannot be extended to include popular emerging tools and libraries.

MapR Data Science Refinery Overview

At MapR, the approach that we take is highly governed by what we’re hearing from our customers. And what we’re hearing is that they want a complete, open, secure, and converged solution. They want the ability to collaborate, visualize, and build while still keeping things easy to deploy and manage. And they don’t want another cluster.

That is why we’re launching the MapR Data Science Refinery. MapR will provide a scalable data science offering with native platform access, superior out-of-the-box security, and access to global event streaming and a multi-model NoSQL database.

The MapR Data Science Refinery

We’ve seen that our customers need agile, easy-to-deploy solutions that can scale to fit the needs of all types of data science teams. Within our platform, we’re offering support for popular open source tooling in a small footprint, containerized, and preconfigured offering that can be distributed to many data science teams across multitenant environments.

The MapR Data Science Refinery plans to initially ship with a data science notebook, Apache Zeppelin, offering:

  • Access to All Platform Assets - The MapR FUSE-based POSIX Client allows app servers, web servers, and other client nodes and apps to read and write data directly and securely to a MapR cluster, like a Linux filesystem. In addition, connectors are provided for interacting with both MapR Database and MapR Event Store via Apache Spark connectors.
  • Superior Security - The MapR Platform is secure by default, and Apache Zeppelin on MapR leverages and integrates with this security layer using the built-in capabilities provided by the MapR Persistent Application Container (PACC).
  • Extensibility - Apache Zeppelin is paired with the Helium framework to offer pluggable visualization capabilities.
  • Simplified Deployment - A preconfigured Docker container provides the ability to leverage MapR as a persistent data store. The Dockerfile is also available, allowing users to customize the image as needed to support specific application needs.

This product is supported by and extended by our Data Science Quick Start Solutions (QSS), which are data science-led product-and-services offerings that enable the training of complex deep learning algorithms (i.e., Deep Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks) at scale. Learn more here.

ML is an active area of research and market innovation, and there are game-changing ML companies investing to improve data science productivity and build domain-specific machine learning solutions. As a data platform company, we want to be open and give our customers flexibility to use these solutions on the petabytes of business data they are relying on MapR to store and manage. So, we have extended this offering with selected Refinery partnerships as a holistic approach to enabling the MapR platform for all types of data science teams.

You can find out more about this offering and our partnerships at the MapR Data Science Refinery page.

This blog post was published October 24, 2017.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now