Data The MapR Data Science Refinery is the industry’s first scalable and open data science offering with native platform access, superior out-of-the-box security, and automatic model storage with mirroring and sharing.
Built using a developer friendly notebook with access to a variety of open source analytics tools that integrate directly with the MapR Platform, the MapR Data Science Refinery is easy to deploy, using a secure, persistent, and extensible container that can be distributed to many data science teams across multi-tenant environments.
The MapR Data Science Refinery supports a broad range of open source tooling, compute and query engines, and libraries for exploration, collaboration, and visualization. Breaking down siloed roles in organizations undergoing digital transformation, the MapR Data Science Refinery enables data productivity as part of a data-focused DataOps team to deploy highly effective operational applications.
With the MapR Data Science Refinery, the vision is to provide data scientists with a suite of tools to enable them to distill insights from the data and leverage the insights into operational next-gen applications that lead to digital transformation for your business.
The MapR Data Science Refinery makes it easy to leverage Machine Learning and Artificial Intelligence on the MapR Converged Data Platform.
The MapR Data Science Refinery is the only data science offering with secured access to all data. It connects out of the box with:
MapR XD. MapR Distributed File and Object Store (MapR XD) is a high-scale, reliable, globally distributed data store that creates a data fabric for managing files and containers. MapR XD supports the most stringent speed, scale, and reliability requirements across multiple edge, on-premises, and cloud environments.
MapR Database. MapR Database is a high performance NoSQL (“Not Only SQL”) database management system built into the MapR Converged Data Platform. It is a highly scalable multi-model database that brings together operations and analytics, and real-time streaming and database workloads to enable a broader set of next-generation data-intensive applications in organizations.
MapR Event Store for Apache Kafka. MapR Event Store is a global publish-subscribe event streaming system for big data. MapR Event Store makes data available instantly to stream processing and other applications.
A core component of the MapR Platform, MapR Event Store provides publish-subscribe event streaming for real-time data access. With native integration between MapR Event Store and machine learning libraries, organizations can now create real-time machine learning pipelines, allowing them to apply ML models to real-time data.
The MapR Data Science Refinery provides a data science notebook with the ability to work across many engines in one visual space. Language and library support includes Apache Spark (Spark Streaming, SparkSQL, SparkR, and PySpark), Apache Hive, Apache Pig, Apache Drill, Python, Shell access to MapR-FS, and programmatic access to MapR Database and MapR Event Store in Spark.
Easy to Deploy. A Docker image is available on Docker Hub. The Docker image includes all the necessary bits—no more, no less—required to leverage MapR as a persistent data store for your containerized applications.
Secure. Authentication occurs at a container level to ensure containerized applications only have access to data for which they are authorized. Communications are encrypted to ensure privacy when accessing data in MapR.
Extensible. A Docker file is also available on GitHub, allowing you to further customize the image as needed to support your specific application needs.
The MapR Data Science Refinery comes with 8 out-of-the-box visualization libraries including MatPlotLib and GGPlot. Data scientists can easily load and unload other common visualization libraries through a pluggable framework. It is also easy to create and load custom visualizations.
The MapR Converged Data Platform is ideal for storing model and notebook repositories. Organizations can leverage the MapR Platform’s global namespace and superior replication capability. The MapR Platform also offers immutable snapshots to persist and deploy various versions of the same model, making it possible for data scientists to compare the performance and accuracy of each version of the model.