TaskTracker Configuration

When changing any parameters in this section, a TaskTracker restart is required.

Warning: When mapreduce.tasktracker.prefetch.maptasks is greater than 0, you must disable Fair Scheduler with preemption and label-based job placement.

Parameter

Description

mapred.tasktracker.map.tasks.maximum

The maximum number of map task slots to run simultaneously. The default value of -1 specifies that the number of map task slots is based on the total amount of memory reserved for MapReduce by the Warden. For more information, see Resource Allocation for Jobs and Applications.

Default value: -1

mapreduce.tasktracker.prefetch.maptasks

The proportion of map tasks that can be scheduled in advance (prefetched) on a TaskTracker. The number is given as a ratio of prefetched tasks to the total number of map slots. For example, 0.25 means the number of prefetched tasks = 25% of the total number of map slots. The default is 0.0, which means no prefetched tasks can be scheduled.

Default value: 0.0

mapreduce.tasktracker.reserved.physicalmemory.mb.low

This property's value sets the target memory usage level when the TaskTracker kills tasks to reduce total memory usage. This property's value represents a percentage of the amount in the mapreduce.tasktracker.reserved.physicalmemory.mb value.

Default value: 0.8

mapreduce.tasktracker.task.slowlaunch

Set this property's value to True to wait after each task launch for nodes running critical services like CLDB, JobTracker, and ZooKeeper.

Default value: False

mapreduce.tasktracker.volume.healthcheck.interval

This property's value defines the frequency in milliseconds that the TaskTracker checks the Mapreduce volume defined in the ${mapr.localvolumes.path}/mapred/ property.

Default value: 60000

mapreduce.use.maprfs

Use MapR-FS for shuffle and sort/merge.

Default value: True

mapred.userlog.retain.hours

This property's value specifies the maximum time, in hours, to retain the user-logs after job completion.

Default value: 24

mapred.userlog.retain.hours.max

This property's value specifies the highest legal value for the mapred.userlog.retain.hours property. When a user specifies a value for mapred.userlog.retain.hours in excess of the value of the mapred.userlog.retain.hours.max property, that value is ignored and the value of the mapred.userlog.retain.hours.max property is used instead.

Default value: 168

mapred.user.jobconf.limit

The maximum allowed size of the user jobconf. The default is set to 5 MB.

Default value: 5242880

mapred.userlog.limit.kb

Deprecated: The maximum size of user-logs of each task in KB. 0 disables the cap.

Default value: 0

mapreduce.use.fastreduce

Expert: Merge map outputs without copying.

Default value: False

mapred.tasktracker.reduce.tasks.maximum

The maximum number of reduce task slots to run simultaneously. The default value of -1 specifies that the number of reduce task slots is based on the total amount of memory reserved for MapReduce by the Warden. For more information, see Resource Allocation for Jobs and Applications.

Default value: -1

mapred.tasktracker.ephemeral.tasks.maximum

Reserved slot for small job scheduling

Default value: 1

mapred.tasktracker.ephemeral.tasks.timeout

Maximum time in milliseconds a task is allowed to occupy ephemeral slot

Default value: 10000

mapred.tasktracker.ephemeral.tasks.ulimit

Ulimit (bytes) on all tasks scheduled on an ephemeral slot

Default value: 4294967296

mapreduce.tasktracker.reserved.physicalmemory.mb

Maximum phyiscal memory TaskTracker should reserve for mapreduce tasks. If tasks use more than the limit, task using maximum memory will be killed. Expert only: Set this value only if TaskTracker should use a certain amount of memory for mapreduce tasks. In MapR Distro warden figures this number based on services configured on a node. Setting mapreduce.tasktracker.reserved.physicalmemory.mb to -1 will disable physical memory accounting and task management.

mapred.tasktracker.expiry.interval

Expert: This property's value specifies a time interval in milliseconds. After this interval expires without any heartbeats sent, a TaskTracker is marked lost.

Default value: 600000

mapreduce.tasktracker.heapbased.memory.management

Expert only: If the admin wants to prevent swapping by not launching too many tasks, use this option. Task's memory usage is based on max java heap size (-Xmx). By default, -Xmx will be computed by the TaskTracker based on slots and memory reserved for mapreduce tasks. See mapred.map.child.java.opts/mapred.reduce.child.java.opts.

Default value: false

mapreduce.tasktracker.jvm.idle.time

If JVM is idle for more than mapreduce.tasktracker.jvm.idle.time (milliseconds) TaskTracker will kill it.

Default value: 10000

mapred.max.tracker.failures

The number of task failures on a TaskTracker of a given job after which new tasks of that job aren't assigned to it.

Default value: 4

mapred.max.tracker.blacklists

The number of blacklists for a TaskTracker by various jobs after which the TaskTracker could be blacklisted across all jobs. The TaskTracker will be given tasks later (after a day). The TaskTracker will become healthy after a restart.

Default value: 4

mapred.task.tracker.http.address

This property's value specifies the HTTP server address and port for the TaskTracker. Specify 0 as the port to make the server start on a free port.

Default value: 0.0.0.0:50060

mapred.task.tracker.report.address

The IP address and port that TaskTrackeer server listens on. Since it is only connected to by the tasks, it uses the local interface. EXPERT ONLY. Only change this value if your host does not have a loopback interface.

Default value: 127.0.0.1:0

mapreduce.tasktracker.group

Expert: Group to which TaskTracker belongs. If LinuxTaskController is configured via the mapreduce.tasktracker.taskcontroller value, the group owner of the task-controller binary $HADOOP_HOME/bin/platform/bin/task-controller must be same as this group.

Default value: mapr

mapred.tasktracker.task-controller.config.overwrite

The LinuxTaskController needs a configuration file set at $HADOOP_HOME/conf/taskcontroller.cfg. The configuration file takes the following parameters:

  • mapred.local.dir = Local dir used by TaskTracker, taken from mapred-site.xml.
  • hadoop.log.dir = hadoop log dir, taken from system properties of the TaskTracker process
  • mapreduce.tasktracker.group = groups allowed to run TaskTracker see 'mapreduce.tasktracker.group'
  • min.user.id = Don't allow any user below this uid to launch a task.
  • banned.users = Users who are not allowed to launch any tasks.
  • If set to true, TaskTracker will always overwrite config file with default values as
  • min.user.id = -1(check disabled), banned.users = bin, mapreduce.tasktracker.group = root To disable this configuration and use a custom configuration, set this property's value to False and restart the TaskTracker.

Default value: true

mapred.tasktracker.indexcache.mb

This property's value specifies the maximum amount of memory allocated by the TaskTracker for the index cache. The index cache is used when the TaskTracker serves map outputs to reducers.

Default value: 10

mapred.tasktracker.instrumentation

Expert: The instrumentation class to associate with each TaskTracker.

Default value: org.apache.hadoop.mapred.TaskTrackerMetricsInst

mapred.task.tracker.task-controller

This property's value specifies the TaskController that launches and manages task execution.

Default value: org.apache.hadoop.mapred.LinuxTaskController

mapred.tasktracker.taskmemorymanager.killtask.maxRSS

Set this property's value to True to kill tasks that are using maximum memory when the total number of MapReduce tasks exceeds the limit specified in the TaskTracker's mapreduce.tasktracker.reserved.physicalmemory.mb property. Tasks are killed in most-recently-launched order.

Default value: False

mapred.tasktracker.taskmemorymanager.monitoring-interval

This property's value specifies an interval in milliseconds that TaskTracker waits between monitoring the memory usage of tasks. This property is only used when tasks memory management is enabled by setting the property mapred.tasktracker.tasks.maxmemory to True.

Default value: 3000

mapred.tasktracker.tasks.sleeptime-before-sigkill

This property's value sets the time in milliseconds that the TaskTracker waits before sending a SIGKILL to a process after it has been sent a SIGTERM.

Default value: 5000

mapred.temp.dir

A shared directory for temporary files.

Default value: ${hadoop.tmp.dir}/mapred/temp

mapreduce.cluster.map.userlog.retain-size

This property's value specifies the number of bytes to retain from map task logs. The default value of -1 disables this feature.

mapreduce.cluster.reduce.userlog.retain-size

This property's value specifies the number of bytes to retain from reduce task logs. The default value of -1 disables this feature.

mapreduce.heartbeat.10000

This property's value specifies a heartbeat time in milliseconds for a medium cluster of 1001 to 10000 nodes. Scales linearly between 10s - 100s.

Default value: 100000

mapreduce.heartbeat.1000

This property's value specifies a heartbeat time in milliseconds for a medium cluster of 101 to 1000 nodes. Scales linearly between 1s - 10s.

Default value: 10000

mapreduce.heartbeat.100

This property's value specifies a heartbeat time in milliseconds for a medium cluster of 11 to 100 nodes. Scales linearly between 300ms - 1s.

Default value: 1000

mapreduce.heartbeat.10

This property's value specifies a heartbeat time in milliseconds for a medium cluster of 1 to 10 nodes.

Default value: 30 0

mapreduce.job.complete.cancel.delegation.tokens

Set this property's value to False to prevent unregister or cancel delegation tokens from renewing.

Default value: True

mapreduce.jobtracker.inline.setup.cleanup

Set this property's value to True to make the JobTracker attempt to set up and clean up the job by itself or do it in setup/cleanup task.

Default value: False