Configure Spark with the NodeManager Local Directory Set to MapR Filesystem

This procedure configures Spark to use the mounted NFS directory instead of the /tmp directory on the local file system:

  1. Install the mapr-loopbacknfs and nfs-utils packages if they are not already installed. For reference, see Installing mapr-loopbacknfs Packages and Setting Up MapR NFS.
  2. Start the mapr-loopbacknfs service by following the steps at Managing the mapr-loopbacknfs Service.
  3. To configure Spark Shuffle on NFS, complete these steps on all nodes:
    1. Create a local volume for Spark Shuffle:
      sudo -u mapr maprcli volume create -name mapr.$(hostname -f).local.spark -path /var/mapr/local/$(hostname -f)/spark -replication 1 -localvolumehost $(hostname -f)
    2. Point the NodeManager local directory to the Spark Shuffle volume mounted through NFS by setting the following property in the yarn-site.xml file on the NodeManager nodes:
      <property>
          <name>yarn.nodemanager.local-dirs</name>
          <value>/mapr/my.cluster.com/var/mapr/local/${mapr.host}/spark</value>
      </property>
      
    3. Restart the NodeManager service (and the Resource Manager service on the main node) to pick up the yarn-site.xml changes:
      maprcli node services -name nodemanager -action restart -nodes <node 1> <node 2> <node 3>
      maprcli node services -name resourcemanager -action restart