The core YARN services -- the Node Manager, History Server, and Resource Manager -- are managed by the Warden daemon on nodes that have these roles configured. For more information on node roles, see Planning the Cluster. Configuration information for these services is available in the
warden.<servicename>.conf file for each service.
New Parameters for the configure.sh Script
The following new parameters are available for the
This parameter specifies if the cluster uses MapReduce 1 or MapReduce 2. Valid values for the -hadoop parameter are 1 or 2. The default value is 2. When the value of the -hadoop parameter is 2, the -RM parameter(see below) is required.
-RM <IP Address>
This parameter specifies the IP address of the node in the cluster that has the ResourceManager role. In the Beta release of YARN for the MapR distribution for Hadoop, only one node in the cluster can be configured with the ResourceManager role.
-HS <IP Address>
This optional parameter specifies the IP or hostname of the node in the cluster that has the History Server role. History Service requires the mapr-historyserver package. When the mapr-historyserver package is installed on the ResourceManager node, the default value for the -HS parameter is the IP address of the ResourceManager node. A cluster supports only one HistoryServer node.
YARN Configuration Files
The following sections provide a brief introduction to YARN configuration files. YARN configuration is set at installation and does not require modification unless you are making specific alterations dictated by your use case.
YARN configuration options are stored in the
/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop/yarn-site.xml file. Hadoop 2.x configuration options are stored in the
/opt/mapr/hadoop/hadoop-<version>/etc/hadoop/mapred-site.xml file. When the
configure.sh script makes changes to these configuration files, the script creates timestamped backup versions of the configuration files.
The configuration file
/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop/yarn-site.xml contains YARN configuration options such as the amount of memory and the number of CPUs available for applications on each node. The yarn-site.xml file also controls the selection and configuration of a given scheduler. Default configuration information is loaded from the yarn-default.xml file.
/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop/mapred-site.xml file contains MapReduce configuration options, many of which have changed to reflect the architecture of MapReduce 2.0. The options for the JobTracker and TaskTrackers are gone, replaced with options for the ResourceManager, NodeManagers, and JobHistoryServer. Under YARN, the concept of map slots and reduce slots has been discarded; resources are allocated dynamically in terms of memory and virtual cores.
Configuring the MapReduce Version for MapR Clients and External Applications
MapReduce jobs require the hadoop classpath that is associated with the MapReduce version.
MapReduce jobs that are run using the hadoop jar command will automatically use the classpath within usr/bin/hadoop. When you run configure.sh on each client machine, you set the
-hadoop argument according to the MapReduce version:
- When you specify
-hadoop 1, the
/usr/bin/hadoopdirectory contains classes that correspond to MapReduce v1 jobs.
- When you specify
-hadoop 2, the
/usr/bin hadoopdirectory contains classes that correspond to MapReduce v2 jobs.
To run MapReduce jobs from a program that does not use the
hadoop jar command, such as an external Java application, perform one of the following methods to specify the Java classpath based on the MapReduce version that you want to use:
- If the program will always run MapReduce v1 jobs, enter
java -cp $(/opt/mapr/hadoop/hadoop-0.20.2/bin/hadoop classpath):<...> my.java.program
- If the program will always run MapReduce v2 jobs, enter
java -cp $(/opt/mapr/hadoop/hadoop-2.3.0/bin/hadoop classpath):<...> my.java.program
- If the program should always run the MapReduce job version specified by the
java -cp $(hadoop classpath):<...> my.java.program
Configuring the YARN Container Size for MapReduce V2 Jobs
The YARN container size for MapReduce jobs is determined by the properties in the mapred-default.xml and mapred-site.xml files. The mapred-default.xml provides defaults that can be overridden using mapred-site.xml, and is located in the Hadoop core JAR file (/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.3.0-mapr-4.0.0-FCS.jar).
The following properties in the mapred-default.xml site determine the YARN container size for each job that runs on the node:
- The value of
mapreduce.map.memory.mbdetermines the YARN container size for each map job.
- The value of the
mapreduce.reduce.memory.mbdetermines the YARN container size for each reduce job.
To override the YARN container memory allocation for all MapReduce jobs that run on the node, add the following properties in mapred-site.xml with the values that you want to specify:
To configure the YARN container sizes for a MapReduce job, you can add the following parameters when you run the job from the command line:
Note: The YARN container size limit, which applies to MapReduce and non-MapReduce jobs, is determined by properties in the yarn-default.xml and yarn-site.xml files. The yarn-default.xml provides defaults that can be overridden using yarn-site.xml, and is located in the Hadoop core JAR file (opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-common-2.3.0-mapr-4.0.0-FCS.jar). When you configure the YARN container size for MapReduce jobs, verify that the size is within the limits set by
Administering YARN with the MCS Display
The MapR Control System (MCS) Web UI for a MapR cluster with YARN displays the following panel:
Clicking the ‘Running Applications’ or ‘Queued Applications’ links will open an MCS tab that displays the ResourceManager’s information about YARN applications running on the cluster.
Links to the YARN services are available from the Services pane:
Clicking the links for the ResourceManager, NodeManager, or HistoryServer nodes will open the service management page for those nodes.
The MCS Navigation pane at the left side of the UI also has links to the ResourceManager and Job History Server nodes.