This document provides information about the following components and services, and also describes their roles in managing a MapR cluster:
Zookeeper is a coordination service for distributed applications. It provides a shared hierarchical namespace that is organized like a standard file system. The namespace consists of data registers called znodes, for Zookeeper data nodes, which are similar to files and directories. A name in the namespace is a sequence of path elements where each element is separated by a
/ character, such as the path
/app1/p_2 shown here:
The znode hierarchy is kept in-memory within each ZooKeeper server in order to minimize latency and to provide high throughput of workloads.
The ZooKeeper Ensemble
The ZooKeeper service is replicated across a set of hosts called an ensemble. One of the hosts is designated as the leader, while the other hosts are followers. ZooKeeper uses a leader election process to determine which ZooKeeper server acts as the leader, or master. If the ZooKeeper leader fails, a new leader is automatically chosen to take its place.
Establishing a ZooKeeper Quorum
As long as a majority (a quorum) of the ZooKeeper servers are available, the Zookeeper service is available. For example, if the ZooKeeper service is configured to run on five nodes, three of them form a quorum. If two nodes fail (or one is taken off-line for maintenance and another one fails), a quorum can still be maintained by the remaining three nodes. An ensemble of five ZooKeeper nodes can tolerate two failures. An ensemble of three ZooKeeper nodes can tolerate only one failure. Because a quorum requires a majority, an ensemble of four ZooKeeper nodes can only tolerate one failure, and therefore offers no advantages over an ensemble of three ZooKeeper nodes. In most cases, you should run three or five ZooKeeper nodes on a cluster. Larger quorum sizes result in slower write operations.
Ensuring Node State Consistency
Each ZooKeeper server maintains a record of all znode write requests in a transaction log on the disk. The ZooKeeper leader issues timestamps to order the write requests, which, when executed, update elements in the shared data store. Each ZooKeeper server must sync transactions to disk and wait for a majority of ZooKeeper servers (a quorum) to acknowledge an update. Once an update is held by a quorum of nodes, a successful response can be returned to clients. By ordering the write requests with timestamps and waiting for a quorum to be established to validate updates, ZooKeeper avoids race conditions and ensures that node state is consistent.
Warden is a light Java application that runs on all the nodes in a cluster and coordinates cluster services. Warden’s job on each node is to start, stop, or restart the appropriate services, and allocate the correct amount of memory to them. Warden makes extensive use of the znode abstraction discussed in the ZooKeeper section of this Guide to monitor the state of cluster services.
Each service running in a cluster has a corresponding znode in the ZooKeeper namespace, named in the pattern
/services/<servicename>/<hostname>. Warden’s Watcher interface monitors znodes for changes and acts when a znode is created or deleted, or when child znodes of a monitored znode are created or deleted.
Warden configuration is contained in the
warden.conf file, which lists service triplets in the form
<number of nodes>
<dependencies>. The number of nodes element of this triplet controls the number of concurrent instances of the service that can run on the cluster. Some services, such as the JobTracker, are restricted to one running instance per cluster, while others, such as the FileServer, can run on every node. The Warden monitors changes to its configuration file in real time.
When a configuration triplet lists another service as a dependency, the Warden only starts that service after the dependency service is running.
Memory Management with the Warden
System administrators can configure how the cluster’s memory is allocated to running the operating system, MapR-FS, and Hadoop services. The configuration files
/opt/mapr/conf/conf.d/warden.<servicename>.conf include parameters that define how much of the memory on a node is allocated to the operating system, MapR-FS, and Hadoop services.
You can edit the following memory parameters to reserve memory:
service.<servicename>.heapsize.percentparameter controls the percentage of system memory allocated to the named service.
service.<servicename>.heapsize.maxparameter defines the maximum heapsize used when invoking the service.
service.<servicename>.heapsize.minparameter defines the minimum heapsize used when invoking the service.
For example, the
service.command.os.heapsize.min parameters in the
warden.conf file controls the amount of memory that Warden allocates to the host operating system before allocating memory to other services.
The actual heap size used when invoking a service is a combination of the three parameters according to the formula
max(heapsize.min, min(heapsize.max, total-memory * heapsize.percent / 100)).
For more information, see Memory Allocation for Nodes.
The Warden and Failover
The Warden on each node watches appropriate znodes to determine whether to start or stop services during failover. The following paragraphs provide failover examples for the CLDB, JobTracker, and ResourceManager. Note that not all failover involves the Warden; NFS failover is accomplished using VIPs.
The ZooKeeper contains a znode corresponding to the active master CLDB. This znode is monitored by the slave CLDBs. When the master CLDB znode is deleted, the slave CLDBs recognize that the master CLDB is no longer running. The slave CLDBs contact Zookeeper in an attempt to become the new master CLDB. The first CLDB to get a lock on the znode in Zookeeper becomes the new master.
If the node running the JobTracker fails and the Warden on the JobTracker node is unable to restart it, Warden starts a new instance of the JobTracker on another node. The Warden on every JobTracker node watches the JobTracker’s znode for changes. When the active JobTracker’s znode is deleted, the Warden daemons on other JobTracker nodes attempt to launch the JobTracker. The Warden service on each JobTracker node works with the Zookeeper to ensure that only one JobTracker is running in the cluster.
In order for failover to occur, at least two nodes in the cluster should include the JobTracker role. No further configuration is required.
Starting in version 4.0.2, if the node running the ResourceManager fails and the Warden on the ResourceManager node is unable to restart it, Warden starts a new instance of the ResourceManager on another node. The Warden on every ResourceManager node watches the ResourceManager’s znode for changes. When the active ResourceManager’s znode is deleted, the Wardens on other ResourceManager nodes attempt to launch the ResourceManager. The Warden on each ResourceManager node works with the Zookeeper to ensure that only one ResourceManager is running in the cluster.
In order for failover to occur in this manner, at least two nodes in the cluster should include the ResourceManager role and your cluster must be use the zero configuration failover implementation. For more information, see ResourceManager High Availability.
The Warden and Pluggable Services
Services can be plugged into the Warden’s monitoring infrastructure by setting up an individual configuration file for each supported service in the
/opt/mapr/conf/conf.d directory, named in the pattern
<number of nodes>
<dependencies> triplets for a pluggable service are stored in the individual
warden.<servicename>.conf files, not in the main
The following services have configuration files pre-configured at installation:
As with other Warden services, the Warden daemon monitors the znodes for a configured component’s service and restarts the service as specified by the configuration triplet. The configuration file also specifies resource limits for the service, any ports used by the service, and a location for log files.
The Container Location Database (CLDB) service tracks the following information about every container in MapR-FS:
The node where the container is located.
The container’s size.
The volume the container belongs to.
The policies, quotas, and usage for that volume.
For more information on containers, see the MapR-FS section of this Guide.
The CLDB also tracks fileservers in the cluster and node activity. Running the CLDB service on multiple nodes distributes lookup operations across those nodes for load balancing, and also provides high availability.
When a cluster runs the CLDB service on multiple nodes, one node acts as the master CLDB and the others act as slaves. The master node has read and write access to the file system, while slave nodes only have read access. The
kvstore (key-value store) container has the container ID 1, and holds cluster-related information. The ZooKeeper tracks container information for the kvstore container. The CLDB assigns a container ID to each new container it creates. The CLDB service tracks the location of containers in the cluster by the container ID.
When a client application opens a file, the application queries the CLDB for for the container ID of the root volume’s name container. The CLDB returns the container ID and the IP addresses of the nodes in the cluster where the replicas of that container are stored. The client application looks up the volume associated with the file in the root volume’s name container, then queries the CLDB for the container ID and IP addresses of the nodes in the cluster with the name container for the target volume. The target volume’s name container has the file ID and inode for the target file. The client application uses this information to open the file for a read or write operation.
Each fileserver heartbeats to the CLDB periodically, at a frequency ranging anywhere from 1-3 seconds depending on the cluster size, to report its status and container information. The CLDB may raise alarms based on the status communicated by the FileServer.
In the MapR Distribution for Apache Hadoop, a node can run both MapReduce v1 (MRv1) and MapReduce v2 (MRv2).
Figure 3. MapReduce 1.0 and MapReduce 2.0 on the same cluster
When you submit a non-MapReduce application to the the cluster, such as a Spark application, the cluster processes the application using the YARN framework.
For MapReduce programs, you can specify which framework to use by configuring one of the following MapReduce modes:
classic: Map Reduce v1 job (Use JobTracker and TaskTracker to run the MapReduce program).
yarn: MapReduce v2 application (Use Resource Manager, Node Manager, and the MapReduce ApplicationMaster to run the MapReduce program).
By default, client and cluster nodes submit MapReduce programs to the YARN framework unless you configure them to use the classic framework. Therefore, if you only plan to run YARN applications (MapReduce v2 and other applications that can run on YARN), no change is required.
For information on how to configure the MapReduce mode, see Managing the MapReduce Mode.
Note: You may need to recompile existing MapReduce v1 jobs in order to successfully run the job in the cluster. For more information, see Configuring Existing MapReduce v1 Jobs to Run in MapR 4.0.x.
Each service on a node has one or more configuration files associated with it. The default version of each configuration file is stored locally under
Customized versions of the configuration files are placed in the
mapr.configuration volume, which is mounted at
/var/mapr/configuration. The following diagram illustrates where each configuration file is stored:
MapR uses the
pullcentralconfig script to detect customized configuration files in
/var/mapr/configuration. This script is launched every five minutes by default. When the script finds a customized file, it overwrites the local files in
/opt/mapr. First, the script looks for node-specific custom configuration files under
/var/mapr/configuration/nodes/<hostname>. If the script does not find any configuration files at that location, the script searches for cluster-wide configuration files under
/default directory stores cluster-wide configuration files that apply to all nodes in the cluster by default.