MapR 5.0 Documentation : Use Spark on YARN

This section contains the following topics: 

Deployment Modes

Spark is preconfigured for YARN and does not require any additional configuration to run.

 Two deployment modes can be used to launch Spark applications on YARN:

  • In yarn-cluster mode, jobs are managed by the YARN cluster. The Spark driver runs inside an Application Master (AM) process that is managed by YARN. This means that the client can go away after initiating the application. 
  • In yarn-client mode, the Spark driver runs in the client process, and the Application Master is only used to request resources from YARN.

MapR recommends using yarn-cluster mode instead of yarn-client mode. If the Spark client that runs the job exits after submitting the job, there is no impact on actual job completion.

Note: In yarn-cluster mode, the local directories used by the Spark executors and the Spark driver are the local directories that are configured for YARN (yarn.nodemanager.local-dirs). If you specify a different path with SPARK_LOCAL_DIRS (as you would for Spark running in standalone mode), that path will be ignored.

Run Spark from the Spark Shell

 In yarn-client mode, complete the following steps to run spark from the Spark shell:

  1. Navigate to the Spark on YARN installation directory:

    cd /opt/mapr/spark/spark-<version>/

    Substitute your Spark version in the command. For example: 1.3.1

  2. Issue the following command to run Spark from the Spark shell:

    MASTER=yarn-client ./bin/spark-shell

You must use yarn-client mode to run Spark from the Spark shell. The yarn-cluster mode is not supported.

Run Spark Applications

For  information about running Spark applications, see the Apache Spark documentation.