Configure Spark JAR Location (Spark 2.0.1 and later)

By default, Spark on YARN uses Spark JAR files that are installed locally. The Spark JAR files can also be added to a world-readable location on MapR Filesystem. When you add the JAR files to a world-readable location, YARN can cache them on nodes to avoid distributing them each time an application runs. Complete the following steps to add the Spark JAR files to a world-readable locaton on MapR Filesystem:
  1. Create a zip archive containing all the JARs from the SPARK_HOME/jars directory. For example:
    cd /opt/mapr/spark/spark-<version>/jars/
    zip /opt/mapr/spark/spark-<version>/spark-jars.zip ./*
  2. Copy the zip file from the local file system to a world-readable location on MapR Filesystem. You can upload it to the home of the current user:
    hadoop fs -put /opt/mapr/spark/spark-<version>/spark-jars.zip

    For example:

    hadoop fs -put /opt/mapr/spark/spark-2.0.1/spark-jars.zip /user/mapr/
  3. Set the spark.yarn.archive property in the spark-defaults.conf file to point to the world-readable location where you added the zip file. Apply this setting on the node where you will be submitting your Spark jobs.
    spark.yarn.archive maprfs:///<path to zip>

    For example:

    spark.yarn.archive maprfs:///user/mapr/spark-jars.zip