YARN is a resource management and scheduling framework that distributes resource management and job management duties. YARN assigns the resource management and job management duties as follows:
ResourceManager - manages cluster resources and tracks resource usage and node health
ApplicationMaster - a framework-specific process that negotiates resources for a single application (a single job or a directed acyclic graph of jobs) which runs in the first container (see below) allocated for the application
The ResourceManager allocates resources among all the applications running the cluster. The ResourceManager includes a pluggable scheduler, which is responsible for allocating resources according to the resource requirements of the running applications. Current MapReduce schedulers, including the Capacity Scheduler and the Fair Scheduler, can be plugged into the YARN scheduler directly.
Each application runs an ApplicationMaster to negotiate resources from the ResourceManager. The ApplicationMaster works with the NodeManagers to execute and monitor tasks.
The duties of the TaskTracker are divided as follows:
NodeManager - One instance runs on each node, to manage that node's resources.
Container - An abstraction representing a unit of resources on a node. The NodeManager provides containers to an application.
The ResourceManager and the NodeManager provide the system for distributed management of applications and resources.
This document includes the following topics related to YARN:
The ResourceManager is mainly concerned with arbitrating available resources in the cluster among competing applications, with the goal of maximum cluster utilization. The ResourceManager includes a pluggable scheduler called the YarnScheduler, which allows different policies for managing constraints such as capacity, fairness, and service level agreements.
Figure 1. Internal Components of the Resource Manager
The Resource manager manages resources as follows:
Each NodeManager takes instructions from the ResourceManager, reporting and handling containers on a single node
Each ApplicationMaster requests resources from the ResourceManager, then works with containers provided by NodeManagers
The ResourceManager communicates with application clients via an interface called the ClientService. A client can submit or terminate an application and gain information about the scheduling queue or cluster statistics through the ClientService.
Administrative requests are served by a separate interface called the AdminService, through which operators can get updated information about cluster operation.
Behind the scenes, the ResourceTrackerService receives node heartbeats from the NodeManager to track new or decommissioned nodes. The NMLivelinessMonitor and NodesListManager keep an updated status of which nodes are healthy so that the scheduler and the ResourceTrackerService can allocate work appropriately.
A component called the ApplicationMasterService manages ApplicationMasters on all nodes, keeping the scheduler informed. A component called the AMLivelinessMonitor keeps a list of ApplicationMasters and their last heartbeat times, in order to let the ResourceManager know what applications are healthy on the cluster. Any ApplicationMaster that does not heartbeat within a certain interval is marked as dead and re-scheduled to run on a new container.
At the core of the ResourceManager is an interface called the ApplicationsManager, which maintains a list of applications that have been submitted, are running, or are completed. The ApplicationsManager accepts job submissions, negotiates the first container for an application (in which the ApplicationMaster will run) and restarts the ApplicationMaster if it fails.
The ResourceManager and NodeManagers communicate via heartbeats.
ResourceManager High Availability
Configure the ResourceManager to be high availability so that the failure of the ResourceManager service is a not single point of failure for the cluster. Starting in 4.0.2, high availability of the ResourceManager is configured by default when you run configure.sh without specifying the -RM parameter. For more information, see ResourceManager High Availability.
The ApplicationMaster is an instance of a framework-specific library that negotiates resources from the ResourceManager and works with the NodeManager to execute and monitor the granted resources (bundled as containers) for a given application. An application can be a process or set of processes, a service, or a description of work.
The ApplicationMaster is run in a container like any other application. The ApplicationsManager, part of the ResourceManager, negotiates for the container in which an application’s ApplicationMaster runs when the application is scheduled by the YarnScheduler.
While an application is running, the ApplicationMaster manages the following:
Dynamic adjustments to resource consumption
Providing status and metrics
The ApplicationMaster is architected to support a specific framework, and can be written in any language since its communication with the NodeManagers and the ResourceManager is accomplished using extensible communication protocols. The ApplicationMaster can be customized to extend the framework or run any other code. For this reason, the ApplicationMaster is not considered trustworthy, and is not run as a trusted service.
An ApplicationMaster typically requests resources on multiple nodes to complete a job by sending the ResourceManager requests that include locality preferences and attributes of the containers. When the ResourceManager is able to allocate a resource to the ApplicationMaster, it generates a lease that the ApplicationMaster pulls on a subsequent heartbeat. A security token associated with the lease guarantees its authenticity when the ApplicationManager presents the lease to the NodeManager to gain access to the container.
The Application Master heartbeats to the ResourceManager to communicate its changing resource needs, and to let the ResourceManager know it is still alive. In response, the ResourceManager can return a lease on additional containers on other nodes, or cancel the lease on some containers. The ApplicationMaster can then adjust its execution strategy to fit the increase or decrease in available resources. When cluster resources become scarce, the ResourceManager can also request that the ApplicationMaster relinquish some resources. The ApplicationMaster can move work to other running containers in order to give up resources gracefully.
The NodeManager runs on each node and manages the following:
Container lifecycle management
Node and container resource usage
Reporting node and container status to the ResourceManager.
When a container is leased to an application, the NodeManager sets up the container’s environment, including the resource constraints specified in the lease and any dependencies such as data or executable files. When instructed by the ResourceManager or the appropriate ApplicationMaster, the NodeManager kills containers. A container can be killed when the ResourceManager reports completion of the application to which it was leased, when the scheduler needs the container for another application, when an ApplicationMaster requests that the container be killed, or when the NodeManager detects that it has exceeded the restraints of its lease. When a container is killed, all its resources are cleaned up, including memory and running tasks. However, a process can mark some output to be preserved until the application itself exits, in order to preserve data beyond the life of the container.
The NodeManager monitors the health of the node, reporting to the ResourceManager when a hardware or software issue occurs so that the scheduler can divert resource allocations to healthy nodes until the issue is resolved.
The NodeManager also offers a number of services to containers running on the node, for example a log aggregation service. The administrator can configure the NodeManager with additional pluggable services.
The core of the NodeManager is the ContainerManager, which maintains a pool of threads in a component called the ContainersLauncher for launching containers. A component called the ContainerTokenSecretManager authenticates incoming requests including container launch requests to ensure that all operations are properly authorized by the ResourceManager. Once a container is launched, a component called the ContainersManager monitors its resource usage in order to prevent runaway containers from affecting other tenants on the cluster.
A YARN container is a result of a successful resource allocation, meaning that the ResourceManager has granted an application a lease to use a specific set of resources in certain amounts on a specific node. The ApplicationMaster presents the lease to the NodeManager on the node where the container has been allocated, thereby gaining access to the resources.
To launch the container, the ApplicationMaster must provide a container launch context (CLC) that includes the following information:
Dependencies (local resources such as data files or shared objects needed prior to launch)
The command necessary to create the process the application plans to launch
The CLC makes it possible for the ApplicationMaster to use containers to run a variety of different kinds of work, from simple shell scripts to applications to virtual machines.
Label-based scheduling provides job placement control on a multi-tenant hadoop cluster. Using label-based scheduling, an administrator can control exactly which nodes are chosen to run jobs submitted by different users and groups.
To use label-based scheduling, an administrator assigns node labels in a text file, then composes queue labels or job labels based on the node labels. When users run jobs, they can place them on specified nodes on a per-job basis (using a job label) or on a per-queue level (using a queue label).
The ResourceManager caches the mapping file, and checks every two minutes (the default monitoring period) for updates. If the file has been modified, the ResourceManager updates the labels for all active ApplicationMasters immediately.
A YARN component called the HistoryServer archives job metrics and metadata. Status on completed applications is available via REST APIs.
MapReduce Version 2
YARN dynamically allocates resources for applications as they execute. The MapReduce version 1 (MRv1) has been rewritten to run as an application on top of YARN; this new version is called MapReduce version 2.0 (MRv2).
Figure 2. A comparison between MapReduce 1.0 and MapReduce 2.0
The main advancement in YARN architecture is the separation of resource management and job management, which were both handled by the same process (the JobTracker) in Hadoop 1.x. Cluster resources and job scheduling are managed by the ResourceManager, and resource negotiation and job monitoring are managed by an ApplicationMaster for each application running on the cluster. In MapReduce, each node advertises a relatively fixed number of map slots and reduce slots. This can lead to resource under-utilization, for example, when there is a heavy reduce load and map slots are available, because the map slots cannot accept reduce tasks (and vice versa).
YARN generalizes resource management for use by new engines and frameworks, allowing resources to be allocated and reallocated for different concurrent applications sharing a cluster. Existing MapReduce applications can run on YARN without any changes. At the same time, because MapReduce is now merely another application on YARN, MapReduce is free to evolve independently of the resource management infrastructure.
How Applications Work in YARN
The following diagram and list of steps provides information about data flow during application execution in YARN.
Figure 3. Control flow during YARN application execution
Application execution consists of the following steps:
A client submits an application to the YARN ResourceManager, including the information required for the CLC.
The ApplicationsManager (in the ResourceManager) negotiates a container and bootstraps the ApplicationMaster instance for the application.
The ApplicationMaster registers with the ResourceManager and requests containers.
The ApplicationMaster communicates with NodeManagers to launch the containers it has been granted, specifying the CLC for each container.
The ApplicationMaster manages application execution. During execution, the application provides progress and status information to the ApplicationMaster. The client can monitor the application’s status by querying the ResourceManager or by communicating directly with the ApplicationMaster.
The ApplicationMaster reports completion of the application to the ResourceManager.
The ApplicationMaster un-registers with the ResourceManager, which then cleans up the ApplicationMaster container.
Direct Shuffle on YARN
During the shuffle phase of a MapReduce job, MapR writes to a MapR-FS volume limited by its topology to the local node instead of writing intermediate data to local disks controlled by the operating system. This improves performance and reduces demand on local disk space while making the output available cluster-wide.
The LocalVolumeAuxiliaryService runs in the NodeManager process. The LocalVolumeAuxiliaryService manages the local volume on each node and cleans up shuffle data after a MapReduce application has finished executing.
Figure 4. The MapR DirectShuffle on YARN
The MRAppMaster service initializes the application by calling initializeApplication() on the LocalVolumeAuxiliaryService.
The MRAppMaster service requests task containers from the ResourceManager. The ResourceManager sends the MRAppMaster information that MRAppMaster uses to request containers from the NodeManager.
The NodeManager on each node launches containers using information about the node’s local volume from the LocalVolumeAuxiliaryService.
Data from map tasks is saved in MRAppMaster for later use in TaskCompletion events, which are requested by reduce tasks.
As map tasks complete, map outputs and map-side spills are written to the local volumes on the map task nodes, generating Task Completion events.
ReduceTasks fetch Task Completion events from the Application Manager. The task Completion events include information on the location of map output data, enabling reduce tasks to copy data from MapOutput locations.
Reduce tasks read the map output information.
Spills and interim merges are written to local volumes on the reduce task nodes.
MRAppMaster calls stopApplication() on the LocalVolumeAuxiliaryService to clean up data on the local volume.
Logging Options on YARN
For YARN applications, there are various logging options to choose from based on the MapR version and the types of applications that you run.
In 4.0.2, you have the following logging options:
- For MapReduce v2 applications, the default logging option is to log files on the local file system. However, central logging and YARN log aggregation are also available.
- For non-MapReduce applications, the default logging option is to log files on the local file system. However,YARN log aggregation is also available.
In 4.0.1, the default logging option is to log files on the local file system. However,YARN log aggregation is also available.
Centralized Logging for MapReduce v2
Centralized logging provides an application-centric view of all the log files generated by NodeManager nodes throughout the cluster. It enables users to gain a complete picture of application execution by having all the logs available in a single directory, without having to navigate from node to node.
The MapReduce program generates three types of log output:
standard output stream - captured in the stdout file
standard error stream - captured in the stderr file
Log4j logs - captured in the syslog file
Centralized logs are available cluster-wide as they are written to the following local volume on the MapR-FS:
Since the log files are stored in a local volume directory that is associated with each NodeManager node, you run the maprcli job linklogs command to create symbolic links for all the logs in a single directory. You can then use tools such as
awk to analyze them from an NFS mount point. You can also view the entire set of logs for a particular application using the HistoryServer UI.
For information about centralized logging and how to enable it, see Managing Centralized Logs for MapReduce v2.
YARN Log Aggregation
The YARN log aggregation option aggregates logs from the local file system and moves log files for completed applications from the local file system to the MapR-FS. This allows users to view the entire set of logs for a particular application using the HistoryServer UI or by running the
yarn logs command.
For more information about YARN log aggregation and how to enable it, see YARN Log Aggregation.