Understanding Topology

The MapR software uses node topology to determine the location of replicated copies of data. Node topology describes the locations of nodes and racks in a cluster. Cluster topology can be defined by specifying a topology for each node in the cluster. You can use topology to group nodes by rack or switch, depending on how the physical cluster is arranged and how you want MapR to place replicated data. When cluster topology is optimally defined, data is replicated to separate racks, which provides continued data availability in the event of rack or node failure.

Topology paths can be as simple or complex as needed to correspond to your cluster layout. In a simple cluster, each topology path might consist of the rack only (for example, /rack-1). In a deployment consisting of multiple datacenters, each topology path can be much longer (eg, europe/uk/london/DC2/room4/row22/rack5/). MapR uses topology paths to spread out replicated copies of data, placing each copy on a separate path. By setting each path to correspond to a physical rack, you can ensure that replicated data is distributed across racks to improve fault tolerance.