4 min read
Apache Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with Spark SQL, Scala, Hive, Flink, Kylin and more. Zeppelin enables rapid development of Spark and Hadoop workflows with simple, easy visualizations. The code from Zeppelin can be used in the Zeppelin notebooks or compiled and packaged into complete applications.
As of the current master branch (and release candidate), all the MapR build profiles are now included in the Apache Zeppelin repository. Four profiles, mapr3, mapr40, mapr41, and mapr50 will build Zeppelin with the appropriate MapR dependencies.
This blog provides instructions for building with the MapR profiles. Building the Hive interpreter for MapR is included, but the dependencies are commented out in the Hive pom.xml file.
Make sure you have at least the MapR client & Spark installed on your machine. Test this by executing a
hadoop fs -ls / and the Spark shell (for example version 1.2.1)
Find a nice directory and run
Build it (version MapR 4.0.x):
mvn clean package -Pbuild-distr -Pmapr40 -Pyarn -Pspark-1.2 -DskipTests
(for version MapR 4.1):
mvn clean package -Pbuild-distr -Pmapr41 -Pyarn -Pspark-1.3 -DskipTests
(for version MapR 5.x):
mvn clean package -Pbuild-distr -Pmapr50 -Pyarn -Pspark-1.3 -DskipTests
This will create a directory called
zeppelin-distribution. In this directory will be a runnable version of Zeppelin and a tar file. The tar file is a complete Zeppelin installation. Use it.
zeppelin-x.x.x-incubating-SNAPSHOT.tar.gzwhere you want to execute the Zeppelin server. Everything is local to that machine, so it is not necessary to have the Zeppelin server on a MapR cluster node.
zeppelin-x.x.x-incubating-SNAPSHOT/confdirectory, you will need to copy
zeppelin-env.sh.template to zeppelin-env.sh
zeppelin-env.sh … you need to export two items.
insert the correct Hadoop version & path)
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=4 -Dspark.executor.memory=2g"
The Hadoop conf directory is where yarn-site.xml lives. The Zeppelin Java Options set information about your Spark deployment. These options are explained in the Spark documentation here.
This should be all you need to do at the command line …. to start the Zeppelin server, execute
Now you need to configure Zeppelin to use your Spark cluster. Point your browser to
Click on Interpreter (top of the page), and edit the Spark section:
You can configure your HiveServer2 on this page as well, if you are using one. Now click on Notebook (top of the page) and select the tutorial.
Be aware of the port number Zeppelin runs on.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.