Configuring the Spark Interpreter

The Spark interpreter is available starting in the 1.1 release of the MapR Data Science Refinery. It provides support for Spark Python, SparkR, Basic Spark, and Spark SQL jobs. To use the Spark interpreter for these variations of Spark, you must take certain actions, including configuring Zeppelin and installing software on your MapR cluster.

You must also issue your docker run command with the parameters the Spark interpreter requires. See the following material for details about these parameters:

Spark Python

The Zeppelin container includes Python 2. You must also install it in your MapR cluster to run Python code with the Spark interpreter. If the version installed in your MapR cluster nodes is different from the version included in the container, some functionality may not work.

To use Python in the Spark interpreter, specify the following in your notebook:
%spark.pyspark

To install custom Python packages, see Installing Custom Packages for PySpark. This also describes how to use Python 3 with custom packages.

SparkR

The Zeppelin container includes R. Some Apache SparkR jobs require you to install R on your MapR cluster nodes to run these jobs in the Spark interpreter. If the version installed in your MapR cluster nodes is different from the version included in the container, some functionality may not work.

To use R in the Spark interpreter, specify the following in your notebook:
%spark.r

Spark Jobs

By default, the Spark interpreter is configured to submit Apache Spark jobs in YARN client mode. The interpreter does not support YARN cluster mode. Make sure you follow the steps described at Installing Spark on YARN to install Spark on your MapR cluster.

To run Spark jobs in parallel, you must modify the Spark interpreter to instantiate Per Note:

You can set scoped to either of the two options.

Hive Tables

To access Apache Hive tables using the Spark interpreter, you must make the hive-site.xml configuration file from your Hive cluster available to Spark running in your Zeppelin container. Follow the same steps that describe how to access Hive tables with the Livy interpreter.