Installing Spark on YARN

This document contains instructions to install Spark on YARN using manual steps. You can also install Spark on YARN using the MapR Installer.

Spark is distributed as two separate packages:

Package Description
mapr-spark Install this package on each node where you want to install Spark. This package is dependent on the mapr-client package.
mapr-spark-historyserver Install this optional package on Spark History Server nodes. This package is dependent on the mapr-spark package and mapr-core package.

To install Spark on YARN (Hadoop 2), execute the following commands as root or using sudo:

  1. Verify that JDK 1.7 or later is installed on node where you want to install Spark.
  2. Create the /apps/spark directory on MapR-FS and set the correct permissions on the directory.
    hadoop fs -mkdir /apps/spark
    hadoop fs -chmod 777 /apps/spark
  3. Install the packages.
    On Ubuntu
    apt-get install mapr-spark mapr-spark-historyserver
    On RedHat / CentOS
    yum install mapr-spark mapr-spark-historyserver
    Note: The mapr-spark-historyserver package is optional.
  4. If you want to integrate Spark with MapR Streams, install the Streams Client on each Spark node.
    • On Ubuntu:
       apt-get install mapr-kafka
    • On RedHat/CentOS:
      yum install mapr-kafka
  5. Run the configure.sh command:
    /opt/mapr/server/configure.sh -R
  6. To test the installation, run the following command as the mapr user:
    MASTER=yarn-client /opt/mapr/spark/spark-<version>/bin/run-example org.apache.spark.examples.SparkPi 10

    This command will fail if it is run as the root user.