Understanding Zeppelin Interpreters

Apache Zeppelin interpreters enable you to access specific languages and data processing backends. This section describes the interpreters you can use with MapR and the use cases they serve.

Supported Zeppelin Interpreters

Apache Zeppelin on MapR supports the following interpreters:

Shell

With the Shell interpreter, you can invoke system shell commands. If you have a MapR-FS mount point, you can access MapR-FS using shell commands like ls and cat by using the FUSE-Based POSIX Client. See Running Shell Commands in Zeppelin for examples that use this interpreter.

Pig

The Apache Pig interpreter enables you to run Apache Pig scripts and queries. See Running Pig Scripts in Zeppelin for examples that use this interpreter.

JDBC - Drill and Hive

Apache Zeppelin on MapR provides preconfigured Apache Drill and Apache Hive JDBC interpreters. See Running Drill Queries in Zeppelin and Running Hive Queries in Zeppelin for examples that use these interpreters.

Livy

The Apache Livy interpreter is a RESTful interface for interacting with Apache Spark. With this interpreter, you can run interactive Scala, Python, and R shells, and submit Spark jobs.

The Spark jobs run in YARN cluster mode so they run inside an application master process managed by YARN. This has the following implications:

  • Allows you to close your Zeppelin notebook without killing your Spark jobs.
  • Supports Spark Dynamic Resource Allocation, which allows you to set idle timeouts in your Spark context to recapture wayward memory.

The Livy interpreter does not support ZeppelinContext and AngularBind. See the description of the Spark interpreter for details about these features.

The following topics contain examples that use the Livy interpreter to access different backend engines:

Spark

The Apache Spark interpreter is available starting in MapR Data Science Refinery 1.1. It provides an alternative to the Livy interpreter.

The Spark interpreter supports the following features not supported by the Livy interpreter:
  • ZeppelinContext - Allows you to create dynamic forms and share objects between Spark Scala and PySpark code
  • AngularBind - Allows you to display charts using data returned from Spark and to pass variables from the Spark interpreter to the Angular interpreter

The Spark interpreter launches Spark jobs in YARN client mode. In this mode, the interpreter launches the Spark driver process on the host machine of the Zeppelin container. This can result in high resource consumption. You also lose the other advantages of running in YARN cluster mode described earlier for the Livy interpreter.

You can run only one version of Python in your container when using the Spark interpreter. The Livy interpreter does not have this limitation.

The following topics contain examples that use the Spark interpreter to access different backend engines:

MapR-DB Shell

The MapR-DB Shell interpreter allows you to run commands available in MapR-DB Shell (JSON Tables) in the Zeppelin UI. Using dbshell commands, you can access MapR-DB JSON tables without having to write Spark code. The interpreter supports all dbshell commands except find commands that specify an ordering.

The interpreter is available starting in MapR Data Science Refinery 1.2. You do not have to run any new additional configuration steps to use this interpreter.

Specify the following in the Zeppelin UI to invoke the interpreter:
%maprdb

See Running MapR-DB Shell Commands in Zeppelin for examples that use this interpreter.

Livy vs Spark Interpreters

The following are general guidelines for choosing between the Livy and Spark interpreters:

  • Use Livy for jobs that are long running or resource intensive
  • Use Livy if you need to run multiple Python versions
  • Use Spark if you need to use visualization features that Livy does not support
Note: Neither interpreter supports Spark standalone mode.

Zeppelin Interpreter Use Cases

The table below summarizes which interpreters to use to access different backend engines for different data processing goals:

Data Processing Goal Zeppelin Interpreter Backend Engine
Data discovery, exploratory querying Livy, Spark Spark SQL
JDBC Hive, Drill
Shell MapR-FS
MapR-DB Shell MapR-DB JSON
ETL, preparation Livy, Spark Spark, PySpark, SparkSQL, SparkStreaming,
Livy, Spark MapR-DB (through the MapR-DB Connectors for Apache Spark)
Livy, Spark MapR-ES (through Spark jobs that query MapR-ES)
Note: See MapR Data Science Refinery Support by MapR Core Version for limitations in version support when accessing MapR-ES.
JDBC Hive
Pig MapReduce
Machine and deep learning, data science Livy, Spark SparkML
Reporting, visualization JDBC Hive, Drill

The following are general guidelines for choosing between the Livy and Spark interpreters:

  • Use Livy for jobs that are long running or resource intensive
  • Use Spark if you use visualization features that Livy does not support

Unsupported Zeppelin Interpreters

Apache Zeppelin on MapR does not support the HBase interpreter. To access MapR-DB binary tables, use the MapR-DB Binary Connector for Apache Spark with either the Livy or Spark interpreter.