The Job Metrics Database

Metrics information is kept in a MySQL database that you configure when you install MapR. The Metrics database provides the following standard tables:
| Tables_in_metrics                 |
| JOB                               |
| JOB_ATTRIBUTES                    |
| JOB_EVENT                         |
| METRIC_TRANSACTION                |
| NODE                              |
| TASK                              |
| TASK_ATTEMPT                      |
| TASK_ATTEMPT_EVENT                |
| TASK_EVENT                        |
  • The JOB and JOB_ATTRIBUTES tables hold job metadata while a job is running. Information from the JOBSEL and JOB_ATTRIBUTES tables is written to the /var/mapr/<cluster name>/mapred/jobTracker/jobs/history/ directory. If a request is made at the MCS for a job that has already been purged from the Metrics database, that data is reloaded from the relevant directory.
  • The METRIC_TRANSACTION_* tables hold job transaction data such as counters. The transactional data is written to the /var/mapr/<cluster name>/mapred/jobTracker/jobs/history/metrics/ directory on each host. This directory depends on the base path of the JobTracker directory. These transactional data files are named <hostname>job<jobID>_<fileID>_metrics.
  • The NODE table holds information about the node ID, hostname, host ID, cluster ID, and creation time.
  • The TASK, TASK_ATTEMPT, and TASK_ATTEMPT_ATTRIBUTES tables hold information related to a job's tasks and task attempts. These tables update while the job is running.

If a job's task data has not been accessed within a configurable time limit, the data from the TASK, TASK_ATTEMPT, and TASK_ATTEMPT_ATTRIBUTES tables is purged. The db.joblastaccessed.limit.hours parameter in the db.conf file sets the number of hours that define this time limit. The default value for this parameter is 48.

The job metrics cover the following categories:
  • Cluster resource use (CPU and memory)
  • Duration (epoch)
  • Task count (map, reduce, failed map, failed reduce)
  • Map rates (record input and output, byte input and output)
  • Reduce rates (record input and output, shuffle bytes)
  • Task attempt counts (map, reduce, failed map, failed attempt)
  • Task attempt durations (average map, average reduce, maximum map, maximum reduce)
The task attempt metrics cover the following categories:
  • Times (task attempt duration, garbage collection time, CPU time)
  • Local byte rate (read and written)
  • Mapr-FS byte rate (read and written)
  • Memory usage (bytes of physical and virtual memory)
  • Records rates (map input, map output, reduce input, reduce output, skipped, spilled, combined input, combined output)
  • Reduce task attempt input groups
  • Reduce task attempt shuffle bytes

The following sections provide information