MapR Data Science Refinery

MAPR DATA SCIENCE REFINERY: ENABLING SELF-SERVICE DATA SCIENCE

IMPROVE ACCURACY OF INSIGHTS AND MAKE DECISIONS IN REAL-TIME

Data The MapR Data Science Refinery is the industry’s first scalable and open data science offering with native platform access, superior out-of-the-box security, and automatic model storage with mirroring and sharing.

Built using a developer friendly notebook with access to a variety of open source analytics tools that integrate directly with the MapR Platform, the MapR Data Science Refinery is easy to deploy, using a secure, persistent, and extensible container that can be distributed to many data science teams across multi-tenant environments.

MapR Data Science Refinery diagram

The MapR Data Science Refinery supports a broad range of open source tooling, compute and query engines, and libraries for exploration, collaboration, and visualization. Breaking down siloed roles in organizations undergoing digital transformation, the MapR Data Science Refinery enables data productivity as part of a data-focused DataOps team to deploy highly effective operational applications.

With the MapR Data Science Refinery, the vision is to provide data scientists with a suite of tools to enable them to distill insights from the data and leverage the insights into operational next-gen applications that lead to digital transformation for your business.

The MapR Data Science Refinery makes it easy to leverage Machine Learning and Artificial Intelligence on the MapR Converged Data Platform.

MAPR DATA SCIENCE REFINERY KEY CAPABILITIES

  • Direct and secure access to all data in the MapR Platform
  • Real-time machine learning pipelines
  • Broad language and library support
  • Easy deployment with persistent and stateful containers
  • Extensible visualization library support
  • Superior model sharing and mirroring

SECURE AND DIRECT ACCESS TO ALL DATA

The MapR Data Science Refinery is the only data science offering with secured access to all data. It connects out of the box with:

MapR XD. MapR Distributed File and Object Store (MapR XD) is a high-scale, reliable, globally distributed data store that creates a data fabric for managing files and containers. MapR XD supports the most stringent speed, scale, and reliability requirements across multiple edge, on-premises, and cloud environments.

MapR Database. MapR Database is a high performance NoSQL (“Not Only SQL”) database management system built into the MapR Converged Data Platform. It is a highly scalable multi-model database that brings together operations and analytics, and real-time streaming and database workloads to enable a broader set of next-generation data-intensive applications in organizations.

MapR Event Store for Apache Kafka. MapR Event Store is a global publish-subscribe event streaming system for big data. MapR Event Store makes data available instantly to stream processing and other applications.

REAL-TIME MACHINE LEARNING PIPELINES

A core component of the MapR Platform, MapR Event Store provides publish-subscribe event streaming for real-time data access. With native integration between MapR Event Store and machine learning libraries, organizations can now create real-time machine learning pipelines, allowing them to apply ML models to real-time data.

BROAD LANGUAGE AND LIBRARY SUPPORT

The MapR Data Science Refinery provides a data science notebook with the ability to work across many engines in one visual space. Language and library support includes Apache Spark (Spark Streaming, SparkSQL, SparkR, and PySpark), Apache Hive, Apache Pig, Apache Drill, Python, Shell access to MapR-FS, and programmatic access to MapR Database and MapR Event Store in Spark.

EASY DEPLOYMENT WITH PERSISTENT AND STATEFUL CONTAINERS

Easy to Deploy. A Docker image is available on Docker Hub. The Docker image includes all the necessary bits—no more, no less—required to leverage MapR as a persistent data store for your containerized applications.

Secure. Authentication occurs at a container level to ensure containerized applications only have access to data for which they are authorized. Communications are encrypted to ensure privacy when accessing data in MapR.

Extensible. A Docker file is also available on GitHub, allowing you to further customize the image as needed to support your specific application needs.

EXTENSIBLE VISUALIZATION LIBRARY SUPPORT

The MapR Data Science Refinery comes with 8 out-of-the-box visualization libraries including MatPlotLib and GGPlot. Data scientists can easily load and unload other common visualization libraries through a pluggable framework. It is also easy to create and load custom visualizations.

SUPERIOR MODEL SHARING AND MIRRORING

The MapR Converged Data Platform is ideal for storing model and notebook repositories. Organizations can leverage the MapR Platform’s global namespace and superior replication capability. The MapR Platform also offers immutable snapshots to persist and deploy various versions of the same model, making it possible for data scientists to compare the performance and accuracy of each version of the model.

KEY BUSINESS BENEFITS

  • Higher accuracy for business predictions
  • Instant insights
  • Higher data scientist productivity Lower TCO
  • Ability to visualize your business Intelligent business processes

Download PDF