Warden determines the percentage of resources available for MapReduce v1 jobs and applications based on the warden.conf file. Applications include MapReduce v2 and non-MapReduce applications such as Spark.
Note: If you modify the values in warden.conf
, you must restart Warden.
The percent of resources allocated for YARN and MapReduce v1 jobs is based on the values of the following parameters in warden.conf
:
Parameter | Default | Description |
---|---|---|
mr1.memory.percent | 50 | The percentage of memory allocated to MapReduce v1 jobs.The remaining memory is allocated to applications. |
mr1.cpu.percent | 50 | The percentage of CPUs allocated to MapReduce v1 jobs.The remaining CPUs are allocated to applications. |
mr1.disk.percent | 50 | The percentage of disks allocated to MapReduce v1 jobs.The remaining disks are allocated to applications. |
These values only apply when TaskTracker and NodeManager roles are installed on a node. For example, if TaskTracker is not installed on the node, NodeManager will get 100% of the resources available to process applications regardless of the warden.conf
settings. Similarly, if NodeManager is not installed on the node, TaskTracker will get 100% of the resources available to process MapReduce jobs regardless of the warden.conf
settings.
This section includes the following topics:
YARN Container Resources
A YARN application can be a MapReduce v2 application or a non-MapReduce application. The Warden on each node calculates the resources that can be allocated to process YARN applications. Each application has an Application Master that negotiates YARN container resources. For MapReduce applications, YARN processes each map or reduce task in a container.
The Application Master requests resources from the Resource Manager based on memory, CPU, and disk requirements for the YARN containers. For YARN containers that process MapReduce v2 tasks, there are additional considerations. See YARN Container Resource Allocation for MapReduce v2 Applications for details.
The Application Master requests YARN container resources based on the values of the following parameters:
Parameter | Default | Description |
---|---|---|
yarn.scheduler.minimum-allocation-mb | 1024 | Defines the minimum memory allocation available for a container in MB. To change the value, edit the yarn-site.xml file for the node that runs the ResourceManager. Assign the new value to this property, then restart the ResourceManager. |
yarn.scheduler.maximum-allocation-mb | 8192 | Defines the maximum memory allocation available for a container in MB To change the value, edit the yarn-site.xml file for the node that runs the ResourceManager. Assign the new value to this property, then restart the ResourceManager. |
yarn.nodemanager.resource.memory-mb | Variable. This value is calculated by Warden. | Defines the memory available to processing Yarn containers on the node in MB. Warden uses the following formula to calculate this value: To determine the value, go to the ResourceManager UI and view the memory available for that node. |
| Variable. This value is calculated by Warden. | Defines the number of CPUs available to process YARN containers on this node. Warden uses the following formula to calculate this value: To determine the value, go to the ResourceManager UI or the YARN pane on the MCS and view the number of CPUs available for that node. To change the value, edit the yarn-site.xml file for the node, assign the new value to this property, then restart the NodeManager. |
yarn.nodemanager.resource.io-spindles | Variable. This value is calculated by Warden. | Defines the number of disks available to process YARN containers. Warden uses the following formula to calculate this value: To determine the value, go to the ResourceManager UI or the YARN pane in the MCS and view the disk information for this node. |
YARN Container Resources for MapReduce v2 Applications
In addition to the YARN container resource allocation parameters, the MapReduce ApplicationMaster also considers the following container requirements when it sends requests to the ResourceManager for containers to run MapReduce jobs:
Parameter | Default | Description |
---|---|---|
| 1024 | Defines the container size for map tasks in MB. |
mapreduce.reduce.memory.mb | 3072 | Defines the container size for reduce tasks in MB. |
mapreduce.reduce.java.opts | -Xmx2560m | Java options for reduce tasks. |
mapreduce.map.java.opts | -Xmx900m | Java options for map tasks. |
| 0.5 | Defines the number of disks a map task requires. |
| 1.33 | Defines the number of disks that a reduce task requires. |
You can use one of the following methods to change the default configuration:
- Provide updated values in the
mapred-site.xml
file on the node that runs the job. You can use central configuration to change this value on each node that runs the NodeManager in the cluster. Then, restart NodeManager on each node in the cluster. Themapred-site.xml
file for MapReduce ve applications is located in the following directory:opt/mapr/hadoop/hadoop-2.x.x/etc/hadoop
- Override the default values from the command line for each application that requires a non-default value.
MapReduce v1 Job Resource Allocation
When a MapReduce v1 job is submitted to JobTracker, JobTracker determines which TaskTracker nodes can process the map and reduce tasks based on the available map and reduce slots. Map and reduce slots are allocated based on the memory available to process MapReduce V1 jobs, and the number of CPUs and Disks available to MapR-FS.
In general, you should not need to customize the number of map and reduce slots. However, you can configure the parameters that are used to calculate the values. For more information, see Customizing the MapReduce v1 Slot Calculation Parameters.
Criteria for Map Slot Calculation
MapR Hadoop sets the number of map slots to the lowest value that results from the following memory, CPU, and disk calculations:
- Memory calculation = (0.4* memory available to process MapReduce v1 tasks)/ memory for each map slot
- CPU calculation:
- If # CPUs on the node > 2, then CPU calculation is =
number of CPU on the node – the number of CPUs assigned to MapR-FS
.
Note: The number of CPUs available to the MapR-FS is 4 for an Enterprise Database Edition installation. Otherwise, the value is 2. - If # CPUs on the node <= 2, the CPU calculation =
1
.
- If # CPUs on the node > 2, then CPU calculation is =
- Disk calculation= 2 * the number of disks available to MapR-FS
Criteria for Reduce Slot Calculation
MapR Hadoop sets the number of reduce slots to the lowest value that results from the following memory, CPU, and disk calculations:
- Memory calculation = (0.6* memory available to process MapReduce v1 tasks)/ memory for allocated to each reduce task
- CPU calculation:
- If # CPUs on the node > 2, then the CPU calculation is =
number of CPUs on the node – the number of CPU assigned to MapR-FS
Note: The number of CPU available to the MapR-FS is4
for an Enterprise Database Edition installation. Otherwise, the value is2.
- If # CPUs on the node <= 2, the CPU calculation is =
1
- If # CPUs on the node > 2, then the CPU calculation is =
- Disk calculation:
- If the # of disks available to the MapR-FS > 2, the disk calculation =
0.75 * the number of disks available to MapR-FS
- If the # of disks available to the MapR-FS <= 2, the disk calculation =
1
- If the # of disks available to the MapR-FS > 2, the disk calculation =
Example Map and Reduce Slot Calculation
In the following example, the node has the following configuration:
Node Resources or Settings | Values |
---|---|
Services and Options | TaskTracker, MapR-FS, MapR-DB |
CPU/Core | 24 |
Disks Available to MapR-FS | 5 |
RAM | 48G |
Chunk Size | 256MB |
Based on this configuration, MapR Hadoop performs the following calculations to determine the number of map and reduce slots:
Calculation | Value | Description |
---|---|---|
Number of CPUs | 4 | Since MapR-DB is running, 4 CPUs are used to determine the slot calculation. |
Memory for Map Slots | 1G | Since the chunk size is 256, 1G is allocated to memory for map slots. |
Memory for Reduce Slots | 3G | Since the chunk size is 256, 3G is allocated to memory for map slots. |
Memory available to process MapReduce V1 tasks | 26G | Based on the services running on the node, Warden calculates the memory available to process MapReduce v1 tasks. |
Map Slots | 10 | This value is based on the following calculations: = Min [ (0.4 * mapred.tasktracker.map.tasks.maximum to 10. |
Reduce Slots | 3 | This value is based on the following calculation: |
Customizing the MapReduce v1 Slot Calculation Parameters
However, you can override the number of slots by adding one or more of these parameters to mapred-site.xml. The mapred-site.xml file for MapReduce v1 jobs is in the following location:
/opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml
.Note: If you make changes to mapred-site.xml
, you must restart TaskTracker.
Warden uses the following parameters to calculate and assign values to map slots and reduce slots on each node:
Parameter | Default Value | Description |
---|---|---|
| Warden uses the following formula to calculate this value: | Defines the memory available to process MapReduce v1 tasks in MB. |
mapred.tasktracker.map.tasks.maximum | Warden uses a formula to calculate this value. For more information, see Criteria for Map Slot Calculation. | Defines the maximum number of MapReduce v1 map slots. |
mapred.tasktracker.reduce.tasks.maximum | Warden uses a formula to calculate this value. For more information, see Criteria for Reduce Slot Calculation. | Defines the maximum number of MapReduce v1 map slots. |
mapred.job.map.memory.physical.mb | If the chunk size is greater than or equal to 256M, then this value is set to 1G. Otherwise, this value is set to 0.5G. | Defines the amount of memory allocated to map tasks in MB. |
mapred.job.reduce.memory.physical.mb | If the chunk size is greater than or equal to 256M, then this value is set to 3G. Otherwise, this value is set to 1.5G. | Defines the amount of memory allocated to reduce tasks in MB. |