MapR 5.0 Documentation : Setting Up Topology

After you have defined node topology for the nodes in your cluster, you can use volume topology to place volumes on specific racks, nodes, or groups of nodes. This section discusses how to set up both types of topology from the command line or from the MapR Control System (MCS).

Setting Up Node Topology

Your node topology describes the locations of nodes and racks in a cluster. The MapR software uses node topology to determine the location of replicated copies of data. When cluster topology is optimally defined, data is replicated to separate racks, which provides continued data availability in the event of rack or node failure.

Define your cluster's topology by specifying a topology for each node in the cluster. You can use topology to group nodes by rack or switch, depending on how the physical cluster is arranged and how you want MapR to place replicated data.

Topology paths can be as simple or complex as needed to correspond to your cluster layout. In a simple cluster, each topology path might consist of the rack only (for example, /rack-1). In a deployment consisting of multiple large datacenters, each topology path can be much longer (for example, /europe/uk/london/datacenter2/room4/row22/rack5/). MapR uses topology paths to spread out replicated copies of data, placing each copy on a separate path. By setting each path to correspond to a physical rack, you can ensure that replicated data is distributed across racks to improve fault tolerance.

Recommended Node Topology

The node topology described in this section enables you to gracefully migrate data off a node in order to decommission the node for replacement or maintenance while avoiding data under-replication.

  • Establish a /data topology path to serve as the default topology path for the volumes in that cluster.
  • Establish a /decommissioned topology path that is not assigned to any volumes.

Migrating a Volume off a Node

When you need to migrate a data volume off a particular node, move that node from the /data path to the /decommissioned path. Since no data volumes are assigned to that topology path, standard data replication will migrate the data off that node to other nodes that are still in the /data topology path.

You can run the following command to check if a given volume is present on a specified node:

maprcli dump volumenodes -volumename <volume> -json | grep <ip:port>

Run this command for each non-local volume in your cluster. Once all the data has migrated off the node, you can decommission the node or place it in maintenance mode.

If you need to segregate CLDB data, create a /cldb topology node and move the CLDB nodes under /cldb. Point the topology for the CLDB volume (mapr.cldb.internal) to /cldb. See Isolating CLDB Nodes for details.

Setting Node Topology Manually

You can specify a topology path for one or more nodes using the node move command, or in the MapR Control System using the following procedure.

To set node topology using the MapR Control System:

  1. In the Navigation pane, expand the Cluster group and click the Nodes view.
  2. Select the checkbox beside each node whose topology you wish to set.
  3. Click the Change Topology button to display the Change Topology dialog.
  4. Set the path in the New Path field:
    • To define a new path, type a topology path. Topology paths must begin with a forward slash ('/').
    • To use a path you have already defined, select it from the dropdown.
  5. Click Move Node to set the new topology.

Setting Node Topology with a Script

For large clusters, you can specify complex topologies in a text file or by using a script. Each line in the text file or script output specifies a single node and the full topology path for that node in the following format:
<ip or hostname> <topology>

The text file or script must be specified and available on the local filesystem on all CLDB nodes:

  • To set topology with a text file, set in /opt/mapr/conf/cldb.conf to the text file name
  • To set topology with a script, set in /opt/mapr/conf/cldb.conf to the script file name

If you specify a script and a text file, the MapR system uses the topology specified by the script.

Setting Up Volume Topology

MapR supports data placement control, in which you can place a volume on specific racks, nodes, or groups of nodes by setting its topology to an existing node topology. You can set volume topology using the MapR Control System or with the volume move command.

To set volume topology using the MapR Control System:

  1. In the Navigation pane, expand the MapR Data Platform group and click the Volumes view.
  2. Display the Volume Properties dialog by clicking the volume name or by selecting the checkbox beside the volume name, then clicking the Properties button.
  3. Click Move Volume to display the Move Volume dialog.
  4. Select a topology path that corresponds to the rack or nodes where you would like the volume to reside.
  5. Click Move Volume to return to the Volume Properties dialog.
  6. Click Modify Volume to save changes to the volume.

Setting Default Volume Topology

By default, new volumes are created with a topology of /data. To change the default topology, use the config save command to change the cldb.default.volume.topology configuration parameter. Example:

maprcli config save -values "{\"cldb.default.volume.topology\":\"/data/rack02\"}"

After running the above command, new volumes have the volume topology /data/rack02 by default, which could be useful to restrict new volume data to subset of the cluster.

Example: Setting Up CLDB-Only Nodes

In a large cluster (100 nodes or more) create CLDB-only nodes to ensure high performance. This configuration also provides additional control over the placement of the CLDB data, for load balancing, fault tolerance, or high availability (HA). Setting up CLDB-only nodes involves restricting the CLDB volume to its own topology and making sure all other volumes are on a separate topology. Because both the CLDB-only path and the non-CLDB path are children of the root topology path, new non-CLDB volumes are not guaranteed to keep off the CLDB-only nodes. To avoid this problem, set a default volume topology. See Setting Default Volume Topology.

To set up a CLDB-only node:

  1. SET UP the node as usual:
    • PREPARE the node, making sure it meets the requirements.
    • ADD the MapR Repository.
  2. INSTALL the following packages to the node.
    • mapr-cldb
    • mapr-webserver
    • mapr-core
    • mapr-fileserver

To set up a volume topology that restricts the CLDB volume to specific nodes:

  1. Move all CLDB nodes to a CLDB-only topology (e. g. /cldbonly) using the MapR Control System or the following command:
    maprcli node move -serverids <CLDB nodes> -topology /cldbonly
  2. Restrict the CLDB volume to the CLDB-only topology. Use the MapR Control System or the following command:
    maprcli volume move -name mapr.cldb.internal -topology /cldbonly
  3. If the CLDB volume is present on nodes not in /cldbonly, increase the replication factor of mapr.cldb.internal to create enough copies in /cldbonly using the MapR Control System or the following command:
    maprcli volume modify -name mapr.cldb.internal -replication <replication factor>
  4. Once the volume has sufficient copies, remove the extra replicas by reducing the replication factor to the desired value using the MapR Control System or the command used in the previous step.

To move all other volumes to a topology separate from the CLDB-only nodes:

  1. Move all non-CLDB nodes to a non-CLDB topology (e. g. /defaultRack) using the MapR Control System or the following command:
    maprcli node move -serverids <all non-CLDB nodes> -topology /defaultRack
  2. Restrict all existing volumes to the topology /defaultRack using the MapR Control System or the following command:
    maprcli volume move -name <volume> -topology /defaultRack
    All volumes except mapr.cluster.root are re-replicated to the changed topology automatically.

    To prevent subsequently created volumes from encroaching on the CLDB-only nodes, set a default topology that excludes the CLDB-only topology.