Configure Spark with the NodeManager Local Directory Set to MapR Filesystem

This procedure configures Spark to use the mounted NFS directory instead of the /tmp directory on the local file system. Note that spill to disk should be configured to spill to the MapR Filesystem node local storage only if local disks are unavailable or space is limited on those disks.
  1. Install the mapr-loopbacknfs and nfs-utils packages if they are not already installed. For reference, see Installing the mapr-loopbacknfs Package and Setting Up MapR NFS.
  2. Start the mapr-loopbacknfs service by following the steps at Managing the mapr-loopbacknfs Service.
  3. To configure Spark Shuffle on NFS, complete these steps on all nodes:
    1. Create a local volume for Spark Shuffle:
      sudo -u mapr maprcli volume create -name mapr.$(hostname -f).local.spark -path /var/mapr/local/$(hostname -f)/spark -replication 1 -localvolumehost $(hostname -f)
    2. Point the NodeManager local directory to the Spark Shuffle volume mounted through NFS by setting the following property in the yarn-site.xml file on the NodeManager nodes:
      <property>
          <name>yarn.nodemanager.local-dirs</name>
          <value>/mapr/my.cluster.com/var/mapr/local/${mapr.host}/spark</value>
      </property>
      
    3. Restart the NodeManager service and the Resource Manager service on the main node to pick up the yarn-site.xml changes:
      maprcli node services -name nodemanager -action restart -nodes <node 1> <node 2> <node 3>
      maprcli node services -name resourcemanager -action restart