What's New in MapR Data Science Refinery 1.3

MapR Data Science Refinery 1.3 introduces new features and some changes in behavior for existing features from prior releases. This release requires that you connect to a MapR 6.1.0 (or later) cluster.

New Features

The following are new features in this release:

Interpreter Lifecycle Management

Prior to the 1.3 release, if a Zeppelin interpreter is idle and using excessive resources, you must either restart or kill the interpreter to reclaim resources. Starting in 1.3, MapR Data Science Refinery terminates interpreters that have been idle for an hour. You can configure this timeout threshold. See Idle Interpreter Timeout Threshold for details.

Helium Repository Browser

Starting with the 1.3 release, MapR Data Science Refinery supports the Helium repository browser for enabling Zeppelin visualization packages. This provides a simpler procedure for enabling these packages. See Using Visualization Packages in Zeppelin for detailed instructions.

Configuration Storage

Starting with the MapR Data Science Refinery 1.3 release, you can store certain Zeppelin configuration files in MapR Filesystem, which enables you to share them across multiple containers. See Configuration Storage for more details.

Default Drill JDBC Connection String

Starting with MapR Data Science Refinery 1.3, you can configure the default Drill JDBC connection URL. See Default Drill JDBC Connection URL for more information.

Building your own Docker Image

Starting with the 1.3 release, you can build your own custom Docker image of MapR Data Science Refinery. See Building your own MapR Data Science Refinery Docker Image for more information.

Changes in Existing Features

The following describe changes in behavior from prior releases:

YARN Cluster Mode for Spark Interpreter Jobs

Prior to the 1.3 release, Spark interpreter jobs run in YARN client mode. The interpreter now runs in cluster mode. This mode reduces Spark resource utilization on the host machine of your MapR Data Science Refinery container. See Understanding Zeppelin Interpreters - Spark for details.

Shared Livy Sessions

In prior releases, the Livy interpreter uses separate Livy sessions for Spark, PySpark, and SparkR jobs. Starting in the 1.3 release, it uses a shared Livy session to run all Spark variations. This reduces resource utilization in your MapR cluster.

Sequential Execution of Notebook Paragraphs

Starting with the 1.3 release, MapR Data Science Refinery runs paragraphs in a notebook sequentially rather than in parallel. This allows paragraphs to run properly when they have dependencies on earlier paragraphs in the same notebook.

Hive JDBC Interpreter and Secure MapR Clusters

Starting with the 1.3 release, you must specify ssl=true in your Hive JDBC URL when connecting to a secure MapR cluster. See Hive JDBC for an example.

Python Versions with the Livy Interpreter

Starting with the 1.3 release, you no longer can run both Python 2 and Python 3 with the Livy interpreter. You can run only one or the other. By default, the interpreter runs Python 2. To switch to Python 3, see Python Version.

The limitation also applies if you are installing custom Python packages. See Installing Custom Packages for PySpark Using Conda for instructions on how to install Python 2 vs Python 3 custom packages.

Running Zeppelin as a Kubernetes Service

The DEPLOY_MODE parameter is your Kubernetes pod manifest file has been renamed to ZEPPELIN_DEPLOY_MODE. You can still use DEPLOY_MODE, but MapR Data Science Refinery 1.3 returns a warning, indicating the parameter is deprecated. See Running MapR Data Science Refinery as a Kubernetes Service for an example of a pod manifest file.

Notebook Storage Using MapR Filesystem

Starting with the 1.3 release, to store your notebooks in MapR Filesystem, you no longer need to use the FUSE-based POSIX client. See Notebook Storage for details.