Configuring Storage on a Node

This section describes how to automatically or manually format disks for cluster storage.

To configure storage, the disksetup utility formats disks for use by the MapR cluster. You can add options to configure.sh to run disksetup or you can manually run disksetup after you run configure.sh.

The disksetup utility removes all data from the specified disks. Make sure you specify the disks correctly, and back up any data that you want to save. If you are re-using a node that was used previously in another cluster, it is important to format the disks to remove any traces of data from the old cluster. See disksetup for more information about the utility.

Note: The disksetup script assumes that you have free, unmounted physical partitions or hard disks for use by MapR. To determine if a disk or partition is ready for use by MapR, see Setting Up Disks for MapR.

Running configure.sh

Before you run configure.sh, collect the information that you need to run the script based on your requirements and the following list:

  • Note the hostnames of the CLDB and ZooKeeper nodes. Optionally, you can specify the ports for the CLDB and ZooKeeper nodes as well. The default CLDB port is 7222. The default ZooKeeper port is 5181.
  • If a node in the cluster runs the HistoryServer, note the hostname for the HistoryServer. The HistoryServer node must be specified using the -HS parameter.

  • If one or more nodes in the cluster runs the ResourceManager, note the hostname or IP address for each ResourceManager node. Based on the version you install and your ResourceManager high availability requirements, you may need to specify the ResourceManager nodes using the -RM parameter. Starting in 4.0.2, high availability for the ResourceManager is configured by default and does not need to be specified.

  • If mapr-fileserver is installed on this node, you can use configure.sh to format the disks and setup partitions or you can manually run disksetup after you run configure.sh.

  • For a cluster node that is on a VM, use the --ipvm parameter when you run configure.sh, so that the script uses less memory.

  • Starting in MapR version 5.1, the MapR Converged Community Edition (formerly Community Edition) license provides read/write access to MapR-DB tables. If you do not plan to access MapR-DB on your cluster, run configure.sh with the -noDB parameter on each node. This results in less memory being allocated to MFS, and more memory being allocated to MapReduce services.

Using configure.sh to run disksetup

To use configure.sh to run disksetup and configure storage, add the following options to configure.sh:

Option Description
-D This parameter allows you to specify a list of disks separated by a single space. configure.sh takes the value that you specify for this parameter and passes the value to the disksetup utility. You cannot indicate partitions with this option.
-F

This parameter allows you to create a text file /tmp/disks.txt that lists the disks and partitions for use by MapR on the node. configure.sh takes the file specified in the -F parameter and passes the file to the disksetup utility. Each line lists either a single disk or all applicable partitions on a single disk. When listing multiple partitions on a line, separate each partition with a space.

Example
/dev/sdb
/dev/sdc1 /dev/sdc2 /dev/sdc4
/dev/sdd
-disk-opts

Optionally, you can also include this parameter. configure.sh takes the values that you specify in the -disk-opts parameter and passes the value to the disksetup utility. For example, if you include -disk-opts FW5 when you run configure.sh, configure.sh runs disksetup with the -F and -W5 options.

Running configure.sh on a node

This script can configure a node for the first time or update existing node configurations. Therefore, it has many configuration options that you can use based on your requirements.

The script configure.sh takes a comma-separated list of CLDBs and ZooKeepers along with optional ResourceManager host names, HistoryServer host name, log file, and cluster name, using the following syntax:

/opt/mapr/server/configure.sh -C <host>[:<port>][,<host>[:<port>]...] -Z <host>[:<port>][,<host>[:<port>]...] [-RM <host>] [-HS <host>] [-L <logfile>][-N <cluster name>]
Example
/opt/mapr/server/configure.sh -C r1n1.sj.us,r3n1.sj.us,r5n1.sj.us -Z r1n1.sj.us,r2n1.sj.us,r3n1.sj.us,r4n1.sj.us,r5n1.sj.us -HS r5n1.sj.us -N MyCluster

Manually running disksetup

If you did not use configure.sh for disksetup, you should run disksetup on the node now. See Formatting Disks on a Node.