The core YARN services -- the Node Manager, History Server, and Resource Manager -- are managed by the Warden daemon on nodes that have these roles configured. For more information on node roles, see Planning the Cluster. Configuration information for these services is available in the warden.<servicename>.conf file for each service.


New Parameters for the configure.sh Script

The following new parameters are available for the configure.sh script: 

-hadoop [1|2]

This parameter specifies if the cluster uses MapReduce 1 or MapReduce 2. Valid values for the -hadoop parameter are 1 or 2. The default value is 2. When the value of the -hadoop parameter is 2, the -RM parameter(see below) is required.

-RM <IP Address>

This parameter specifies the IP address of the node in the cluster that has the ResourceManager role. In the Beta release of YARN for the MapR distribution for Hadoop, only one node in the cluster can be configured with the ResourceManager role.

-HS <IP Address>

This optional parameter specifies the IP or hostname of the node in the cluster that has the History Server role. History Service requires the mapr-historyserver package. When the mapr-historyserver package is installed on the ResourceManager node, the default value for the -HS parameter is the IP address of the ResourceManager node. A cluster supports only one HistoryServer node.

YARN Configuration Files

The following sections provide a brief introduction to YARN configuration files. YARN configuration is set at installation and does not require modification unless you are making specific alterations dictated by your use case.

YARN configuration options are stored in the /opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop/yarn-site.xml file. Hadoop 2.x configuration options are stored in the /opt/mapr/hadoop/hadoop-<version>/etc/hadoop/mapred-site.xml file. When the configure.sh script makes changes to these configuration files, the script creates timestamped backup versions of the configuration files. 

yarn-site.xml

The configuration file /opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop/yarn-site.xml contains YARN configuration options such as the amount of memory and the number of CPUs available for applications on each node. The yarn-site.xml file also controls the selection and configuration of a given scheduler. Default configuration information is loaded from the yarn-default.xml file.

mapred-site.xml

The /opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop/mapred-site.xml file contains MapReduce configuration options, many of which have changed to reflect the architecture of MapReduce 2.0. The options for the JobTracker and TaskTrackers are gone, replaced with options for the ResourceManager, NodeManagers, and JobHistoryServer. Under YARN, the concept of map slots and reduce slots has been discarded; resources are allocated dynamically in terms of memory and virtual cores.

Icon

For the FCS release of MapR 4.0.0, the MapReduce 1 roles JobTracker and TaskTracker are mutually exclusive with the MapReduce roles of ResourceManager, NodeManager, and HistoryServer.  A node cannot have MapReduce 1 and MapReduce 2 roles.

Configuring the MapReduce Version for MapR Clients and External Applications

MapReduce jobs require the hadoop classpath that is associated with the MapReduce version. 

MapReduce jobs that are run using the hadoop jar command will automatically use the classpath within usr/bin/hadoop. When you run configure.sh on each client machine, you set the -hadoop argument according to the MapReduce version:

  • When you specify -hadoop 1, the /usr/bin/hadoop directory contains classes that correspond to MapReduce v1 jobs.
  • When you specify -hadoop 2, the /usr/bin hadoop directory contains classes that correspond to MapReduce v2 jobs.

To run MapReduce jobs from a program that does not use the hadoop jar command, such as an external Java application, perform one of the following methods to specify the Java classpath based on the MapReduce version that you want to use:

  • If the program will always run MapReduce v1 jobs, enter java -cp $(/opt/mapr/hadoop/hadoop-0.20.2/bin/hadoop classpath):<...> my.java.program
  • If the program will always run MapReduce v2 jobs, enter java -cp $(/opt/mapr/hadoop/hadoop-2.3.0/bin/hadoop classpath):<...> my.java.program
  • If the program should always run the MapReduce job version specified by the configure.sh program, enter java -cp $(hadoop classpath):<...> my.java.program

Configuring the YARN Container Size for MapReduce V2 Jobs 

The YARN container size for MapReduce jobs is determined by the properties in the mapred-default.xml and mapred-site.xml files. The mapred-default.xml provides defaults that can be overridden using mapred-site.xml, and is located in the Hadoop core JAR file (/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.3.0-mapr-4.0.0-FCS.jar).

The following properties in the mapred-default.xml site determine the YARN container size for each job that runs on the node:

  • The value of mapreduce.map.memory.mb determines the YARN container size for each map job.
  • The value of the mapreduce.reduce.memory.mb determines the YARN container size for each reduce job.

To override the YARN container memory allocation for all MapReduce jobs that run on the node, add the following properties in mapred-site.xml with the values that you want to specify: 

<property>
<name>mapreduce.map.memory.mb</name>
<value>###</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>###</value>
</property>

 To configure the YARN container sizes for a MapReduce job, you can add the following parameters when you run the job from the command line:

-Dmapreduce.map.memory.mb=<value>
-Dmapreduce.reduce.memory.md=<value>

Note: The YARN container size limit, which applies to MapReduce and non-MapReduce jobs, is determined by properties in the yarn-default.xml and yarn-site.xml files. The yarn-default.xml provides defaults that can be overridden using yarn-site.xml, and is located in the Hadoop core JAR file (opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-common-2.3.0-mapr-4.0.0-FCS.jar). When you configure the YARN container size for MapReduce jobs, verify that the size is within the limits set by yarn.scheduler.minimum-allocation-mb and yarn.scheduler.maximum-allocation-mb.

Administering YARN with the MCS Display

The MapR Control System (MCS) Web UI for a MapR cluster with YARN displays the following panel:

Clicking the ‘Running Applications’ or ‘Queued Applications’ links will open an MCS tab that displays the ResourceManager’s information about YARN applications running on the cluster.

Links to the YARN services are available from the Services pane:

Clicking the links for the ResourceManager, NodeManager, or HistoryServer nodes will open the service management page for those nodes.

The MCS Navigation pane at the left  side of the UI also has links to the ResourceManager and Job History Server nodes.