MapR Data Science Refinery Tutorial

Contributed by

3 min read

The MapR Data Science Refinery (DSR) is an easy-to-deploy and scalable data science toolkit with native access to all MapR Data Platform assets and out-of-the-box security. MapR’s DSR includes tools such as the Apache Zeppelin notebook to facilitate collaboration plus Apache Spark, Apache Drill and Apache Hive for data processing and preparation. You can find out more about the role of the Data Science Refinery and the MapR Data Platform for AI and machine learning systems in the ebook Buyer’s Guide to AI and Machine Learning.

Ready to explore the DSR? We are proud to announce a set of tutorials allowing users to easily run the MapR Data Science Refinery in their local Docker environment and connect to their MapR cluster.

This tutorial is based on a set of step by step guides located in the following GitHub repository:

The content of the tutorial is the following:

1. Installation and configuration options

The Data Science Refinery is deployed using containers. In this document you will learn how to configure and run the Data Science Refinery Docker container, allowing you to connect Apache Zeppin to your MapR Cluster.

2. Zeppelin interpreters on MapR

Discover the power of Apache Zeppelin interpreters that let the user work with their favorite tools to process data, for example Apache Spark, Hive or Drill.

3. Zeppelin notebooks on MapR

Discover how to access or create notebooks with Zeppelin on MapR.

4. Visualization in MapR DSR

Use Helium in Apache Zeppelin to create rich user experience in your notebook.

5. Examples of using MapR DSR with different backend engines

  1. Running Shell Commands
  2. Running Pig Scripts
  3. Running Drill Queries
  4. Running Hive Queries
  5. Running Spark Jobs
  6. Running MapR DB Shell Commands
  7. Accessing MapR DB in Zeppelin Using the MapR Database Binary Connector
  8. Accessing MapR DB in Zeppelin Using the MapR Database OJAI Connector
  9. Accessing MapR Event Store For Apache Kafka in Zeppelin Using the Livy Interpreter
  10. Accessing MapR Event Store For Apache Kafka in Zeppelin Using the Spark Interpreter

6. Installing custom packages (Tensorflow)

7. Sharing Zeppelin Notebook on MapR DSR

8. Building your own MapR DSR Docker Image

9. Troubleshooting Zeppelin on MapR DSR

10. Official documentation for DSR

This documentation explains in detail the MapR Data Science Refinery for MapR release 6.1.

Additional Resources:

eBook Machine Learning Logistics by Ted Dunning & Ellen Friedman

eBook Getting Started with Apache Spark 2.x by Carol McDonald with Ian Downard

Webinar recording: "Getting Started with Spark 2.x and GraphFrames to Analyze Flight Delays and Distances" by Carol McDonald

This blog post was published May 22, 2019.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now