Installing Spark Standalone

This topic includes instructions for using package managers to download and install Spark Standalone from the MEP repository.

For instructions on setting up the MEP repository, see Step 8: Install Ecosystem Components Manually.
Spark is distributed as three separate packages:
Package Description
mapr-spark Install this package on Spark worker nodes. This package is dependent on the mapr-client package.
mapr-spark-master Install this package on Spark master nodes. Spark master nodes must be able to communicate with Spark worker nodes over SSH without using passwords. This package is dependent on the mapr-spark package and the mapr-core package.
mapr-spark-historyserver Install this optional package on Spark History Server nodes. This package is dependent on the mapr-spark package and mapr-core package.
Run the following commands as root or using sudo.
  1. Create the /apps/spark directory on MapR-FS, and set the correct permissions on the directory.
    hadoop fs -mkdir /apps/spark
    hadoop fs -chmod 777 /apps/spark
  2. Use the appropriate commands for your operating system to install Spark.
    On CentOS / RedHat
    yum install mapr-spark mapr-spark-master mapr-spark-historyserver
    On Ubuntu
    apt-get install mapr-spark mapr-spark-master mapr-spark-historyserver
    On SUSE
    zypper install mapr-spark mapr-spark-master mapr-spark-historyserver
    Note: The mapr-spark-historyserver package is optional.

    Spark is installed into the /opt/mapr/spark directory.

  3. Copy the /opt/mapr/spark/spark-<version>/conf/slaves.template into /opt/mapr/spark/spark-<version>/conf/slaves, and add the hostnames of the Spark worker nodes. Put one worker node hostname on each line. For example:
    localhost
    worker-node-1
    worker-node-2
  4. Set up passwordless ssh for the mapr user such that the Spark master node has access to all slave nodes defined in the conf/slaves file.
  5. As the mapr user, start the worker nodes by running the following command in the master node. Since the Master daemon is managed by the Warden daemon, do not use the start-all.sh or stop-all.sh command.
    /opt/mapr/spark/spark-<version>/sbin/start-slaves.sh
  6. If you want to integrate Spark with MapR Streams, install the Streams Client on each Spark node:
    • On Ubuntu:
       apt-get install mapr-kafka
    • On RedHat/CentOS:
      yum install mapr-kafka
  7. If you want to use a Streaming Producer, add the spark-streaming-kafka-producer_2.11.jar from the MapR Maven repository to the Spark classpath (/opt/mapr/spark/spar-<versions>/jars/).
  8. Test your new installation by running the SparkPi example. Use the following command:
    • On Spark 2.0.1:
      /opt/mapr/spark/spark-<version>/bin/run-example --master spark://<Spark Master node hostname>:7077 SparkPi 10
    • On Spark 1.6.1:
      MASTER=spark://<Spark Master node hostname>:7077 /opt/mapr/spark/spark-<version>/bin/run-example org.apache.spark.examples.SparkPi 10