MapR 5.0 Documentation : Best Practices

Disk Setup

It is not necessary to set up RAID (Redundant Array of Independent Disks) on disks used by MapR-FS. MapR uses a script called disksetup to set up storage pools. In most cases, you should let MapR calculate storage pools using the default stripe width of two or three disks. If you anticipate a high volume of random-access I/O, you can use the -W option with disksetup to specify larger storage pools of up to 8 disks each.

Setting Up MapR NFS

MapR uses version 3 of the NFS protocol. NFS version 4 bypasses the port mapper and attempts to connect to the default port only. If you are running NFS on a non-standard port, mounts from NFS version 4 clients time out. Use the -o nfsvers=3 option to specify NFS version 3.

NIC Configuration

For high performance clusters, use more than one network interface card (NIC) per node. MapR can detect multiple IP addresses on each node and load-balance throughput automatically.

Isolating CLDB Nodes

In a large cluster (100 nodes or more) create CLDB-only nodes to ensure high performance. This configuration also provides additional control over the placement of the CLDB data, for load balancing, fault tolerance, or high availability (HA). Setting up CLDB-only nodes involves restricting the CLDB volume to its own topology and making sure all other volumes are on a separate topology. Because both the CLDB-only path and the non-CLDB path are children of the root topology path, new non-CLDB volumes are not guaranteed to keep off the CLDB-only nodes. To avoid this problem, set a default volume topology. See Setting Default Volume Topology.

To set up a CLDB-only node:

  1. SET UP the node as usual:
    • PREPARE the node, making sure it meets the requirements.
    • ADD the MapR Repository.
  2. INSTALL the following packages to the node.
    • mapr-cldb
    • mapr-webserver
    • mapr-core
    • mapr-fileserver

To set up a volume topology that restricts the CLDB volume to specific nodes:

  1. Move all CLDB nodes to a CLDB-only topology (e. g. /cldbonly) using the MapR Control System or the following command:
    maprcli node move -serverids <CLDB nodes> -topology /cldbonly
  2. Restrict the CLDB volume to the CLDB-only topology. Use the MapR Control System or the following command:
    maprcli volume move -name mapr.cldb.internal -topology /cldbonly
  3. If the CLDB volume is present on nodes not in /cldbonly, increase the replication factor of mapr.cldb.internal to create enough copies in /cldbonly using the MapR Control System or the following command:
    maprcli volume modify -name mapr.cldb.internal -replication <replication factor>
  4. Once the volume has sufficient copies, remove the extra replicas by reducing the replication factor to the desired value using the MapR Control System or the command used in the previous step.

To move all other volumes to a topology separate from the CLDB-only nodes:

  1. Move all non-CLDB nodes to a non-CLDB topology (e. g. /defaultRack) using the MapR Control System or the following command:
    maprcli node move -serverids <all non-CLDB nodes> -topology /defaultRack
  2. Restrict all existing volumes to the topology /defaultRack using the MapR Control System or the following command:
    maprcli volume move -name <volume> -topology /defaultRack
    All volumes except mapr.cluster.root are re-replicated to the changed topology automatically.

    To prevent subsequently created volumes from encroaching on the CLDB-only nodes, set a default topology that excludes the CLDB-only topology.

Isolating ZooKeeper Nodes

For large clusters (100 nodes or more), isolate the ZooKeeper on nodes that do not perform any other function. Isolating the ZooKeeper node enables the node to perform its functions without competing for resources with other processes. Installing a ZooKeeper-only node is similar to any typical node installation, but with a specific subset of packages.

Do not install the FileServer package on an isolated ZooKeeper node in order to prevent MapR from using this node for data storage.

To set up a ZooKeeper-only node:

  1. SET UP the node as usual:
    • PREPARE the node, making sure it meets the requirements.
    • ADD the MapR Repository.
  2. INSTALL the following packages to the node.
    • mapr-zookeeper
    • mapr-zk-internal
    • mapr-core

Setting Up RAID on the Operating System Partition

You can set up RAID on the operating system partition(s) or drive(s) at installation time, to provide higher operating system performance (RAID 0), disk mirroring for failover (RAID 1), or both (RAID 10), for example. See the following instructions from the operating system websites:


MapR provides an express path (called ExpressLane) that works in conjunction with the Fair Scheduler. ExpressLane is for small MapReduce jobs to run when all slots are occupied by long tasks. Small jobs are only given this special treatment when the cluster is busy, and only if they meet the criteria specified by the following parameters in mapred-site.xml:






Enable small job fast scheduling inside fair scheduler. TaskTrackers should reserve a slot called ephemeral slot which is used for smalljob if cluster is busy.



Small job definition. Max number of maps allowed in small job.



Small job definition. Max number of reducers allowed in small job.



Small job definition. Max input size in bytes allowed for a small job. Default is 10GB.



Small job definition. Max estimated input size for a reducer allowed in small job. Default is 1GB per reducer.



Small job definition. Max memory in mbytes reserved for an ephermal slot. Default is 200mb. This value must be same on JobTracker and TaskTracker nodes.

MapReduce jobs that appear to fit the small job definition but are in fact larger than anticipated are killed and re-queued for normal execution.


  • The HBase write-ahead log (WAL) writes many tiny records, and compressing it would cause massive CPU load. Before using HBase, turn off MapR compression for directories in the HBase volume (normally mounted at /hbase. Example:

    hadoop mfs -setcompression off /hbase
  • You can check whether compression is turned off in a directory or mounted volume by using hadoop mfs to list the file contents. Example:

    hadoop mfs -ls /hbase

    The letter Z in the output indicates compression is turned on; the letter U indicates compression is turned off. See hadoop mfs for more information.

  • On any node where you plan to run both HBase and MapReduce, give more memory to the FileServer than to the RegionServer so that the node can handle high throughput. For example, on a node with 24 GB of physical memory, it might be desirable to limit the RegionServer to 4 GB, give 10 GB to MapR-FS, and give the remainder to TaskTracker. To change the memory allocated to each service, edit the /opt/mapr/conf/warden.conf file. See Resource Allocation for Jobs and Applications for more information.

Tuning for SSDs

On servers with SSDs:

  1. Enable TRIM operation in the mfs.conf file.
    To enable, set the value for mfs.ssd.trim.enabled to 1. For example:
  2. Disable IO throttling in the mfs.conf file.
    To disable, set the value for mfs.disk.iothrottle.count to 50000. The default value for mfs.disk.iothrottle.count is 100. For example:
  3. Create one storage pool per SSD, instead of the default 3 drives per storage pool.
    To create, run disksetup
    /opt/mapr/server/disksetup -W 1 disks.txt

  4. Enable noop scheduler on the SSD by running this command:
    echo “noop”> /sys/block/sda/queue/scheduler