This section contains the following topics:
Spark is preconfigured for YARN and does not require any additional configuration to run.
Two deployment modes can be used to launch Spark applications on YARN:
yarn-clustermode, jobs are managed by the YARN cluster. The Spark driver runs inside an Application Master (AM) process that is managed by YARN. This means that the client can go away after initiating the application.
yarn-clientmode, the Spark driver runs in the client process, and the Application Master is only used to request resources from YARN.
MapR recommends using
yarn-cluster mode instead of
yarn-client mode. If the Spark client that runs the job exits after submitting the job, there is no impact on actual job completion.
yarn-cluster mode, the local directories used by the Spark executors and the Spark driver are the local directories that are configured for YARN (
yarn.nodemanager.local-dirs). If you specify a different path with
(as you would for Spark running in standalone mode), that path will be ignored.
Run Spark from the Spark Shell
In yarn-client mode, complete the following steps to run spark from the Spark shell:
Navigate to the Spark on YARN installation directory:
Substitute your Spark version in the command. For example:
Issue the following command to run Spark from the Spark shell:
You must use yarn-client mode to run Spark from the Spark shell. The
yarn-cluster mode is not supported.
Run Spark Applications
For information about running Spark applications, see the Apache Spark documentation.