MapR 5.0 Documentation : Planning Roles

In most clusters, a small number of nodes runs a set of control services devoted to cluster management and Hadoop infrastructure:

  • CLDB
  • JobTracker
  • WebServer
  • Zookeeper

The remainder of the nodes are devoted to services related to data processing and storage:

  • FileServer
  • TaskTracker

Supplementary services can run on many or few nodes, depending on how the cluster is to be used. Examples:

  • NFS
  • HBase

The following table provides general guidelines for the number of instances of each service to run in a cluster:

Service

Package

How Many

CLDB

mapr-cldb

1-3

FileServer

mapr-fileserver

Most or all nodes

HBase Master

mapr-hbase-master

1-3

HBase RegionServer

mapr-hbase-regionserver

Varies

JobTracker

mapr-jobtracker

1-3

NFS

mapr-nfs

Varies

TaskTracker

mapr-tasktracker

Most or all nodes

WebServer

mapr-webserver

One or more

Zookeeper

mapr-zookeeper

1, 3, 5, or a higher odd number

Sample Configurations

The following sections describe a few typical ways to deploy a MapR cluster.

Small M3 Cluster

A small M3 cluster runs most control services on only one node (except for ZooKeeper, which runs on three) and data services on the remaining nodes. The M3 license does not permit failover or high availability, and only allows one running CLDB.

Small M5 Cluster

A small M5 cluster runs control services on three nodes and data services on the remaining nodes, providing failover and high availability for all critical services.

Larger M5 Cluster

A large cluster (over 100 nodes) should isolate CLDB nodes from the TaskTracker and NFS nodes.

In large clusters, you should not run TaskTracker and ZooKeeper together on any nodes.

Example

 
 

Attachments:

M3-diagram.png (image/png)
M5-diagram-big.png (image/png)
M5-diagram.png (image/png)