MapR 4.0.x Documentation : Setting Up Disks for MapR

MapR formats and uses disks for the Lockless Storage Services layer (MapR-FS), and records these disks in the file disktab. In a production environment, or when testing performance, MapR should be configured to use physical hard drives and partitions. In some cases, it is necessary to reinstall the operating system on a node so that the physical hard drives are available for direct use by MapR. Reinstalling the operating system provides an unrestricted opportunity to configure the hard drives. If the installation procedure assigns hard drives to be managed by the Linux Logical Volume Manager (LVM) by default, you should explicitly remove the drives you plan to use with MapR from the LVM configuration. It is common to let LVM manage one physical drive containing the operating system partition(s) and to leave the rest unmanaged by LVM for use with MapR.

This section describes how to set up disks during the normal installation process. Go to the disksetup command page for information about other uses of this command.

The following procedures are intended for use on physical clusters or Amazon EC2 instances. On EC2 instances, EBS volumes can be used as MapR storage, although performance will be slow.

If you are using MapR on Amazon EMR, you do not have to use this procedure; the disks are set up for you automatically.

To determine if a disk or partition is ready for use by MapR:

  1. Run the command sudo lsof <partition> to determine whether any processes are already using the disk or partition.
  2. There should be no output when running sudo fuser <partition>, indicating there is no process accessing the specific disk or partition.
  3. The disk or partition should not be mounted, as checked via the output of the mount command. If the disk or partition is mounted, unmount it using the umount command.
  4. The disk or partition should not have an entry in the /etc/fstab file; comment out or delete any such entries.
  5. The disk or partition should be accessible to standard Linux tools such as mkfs. You should be able to successfully format the partition using a command like sudo mkfs.ext3 <partition> as this is similar to the operations MapR performs during installation. If mkfs fails to access and format the partition, then it is highly likely MapR will encounter the same problem.

Any disk or partition that passes the above testing procedure can be added to the list of disks and partitions passed to the disksetup command.

To specify disks or partitions for use by MapR:

The disksetup script is used to format disks for use by the MapR cluster. Create a text file /tmp/disks.txt listing the disks and partitions for use by MapR on the node. Each line lists either a single disk or all applicable partitions on a single disk. When listing multiple partitions on a line, separate by spaces. For example:

/dev/sdc1 /dev/sdc2 /dev/sdc4

Later, when you run disksetup to format the disks, specify the disks.txt file. For example:

/opt/mapr/server/disksetup -F /tmp/disks.txt

The script disksetup removes all data from the specified disks. Make sure you specify the disks correctly, and that any data you wish to keep has been backed up elsewhere.

If you are re-using a node that was used previously in another cluster, be sure to format the disks to remove any traces of data from the old cluster.

Run disksetup only after running

To evaluate MapR using a flat storage file instead of formatting disks:

When setting up a small cluster for evaluation purposes, if a particular node does not have physical disks or partitions available to dedicate to the cluster, you can use a flat file on an existing disk partition as the node's storage. Create at least a 16GB file, and include a path to the file in the disk list file for the disksetup script.

The following example creates a 20 GB flat file (bs=1G specifies 1 gigabyte blocks, multiplied by count=20) at /root/storagefile:

$ dd if=/dev/zero of=/root/storagefile bs=1G count=20

Then, you would add the following to the disk list file /tmp/disks.txt to be used by disksetup: