MapR 5.0 Documentation : Setting Up Multi-Master Replication Manually

You can run maprcli commands to set up multi-master replication, first establishing replication in one direction, and then establishing it in the other direction.

For more information about multi-master replication, see Setting Up Multi-Master Replication.

Prerequisites

  • Configure one or more gateways in the destination cluster. See MapR Gateways.
  • If the source and destination clusters are secured, set up security for replication between the clusters. See Configuring MapR Clusters for Replication Between Tables.
  • Run the maprcli table info command on the source table to verify that you have the following permissions:
    • readperm, which is required for reading from the table.
    • replperm, which is required for replicating from the table.
  • Run the maprcli table info command on the destination table (if it already exists) to verify that you have the following permissions:
    • bulkload, which is required for the initial copy of source data into the destination table.
    • replperm, which is required for receiving replicated updates from the source table. 

Procedure

To set up multi-master replication manually:

  1. Log into both the source and destination clusters.
  2. To set up replication in one direction, follow either of these steps:

    Although this procedure describes the steps to take in the maprcli, you can set up this replication topology in the MapR Control Service (MCS). Log into MCS and select MapR Tables in the navigation menu. Select a table to be the source table and click the Replicas tab. The actions for setting up replication are in this location.


    • For automatic setup in one direction, run the maprcli table replica autosetup command, which performs these steps:

    1. Create a table on the replication cluster. This table has the same column families as the source table.
    2. Declare the new table to be a replica of the source table.
    3. Declare the source table as an upstream source for the replica.
    4. Load the a copy of the source data into the replica.
    5. Start replication.
    Here is the syntax of the command:
    maprcli table replica autosetup -path <path to source table> -replica <path to replica>

    For example, to set up replication from the customers table in the sanfrancisco cluster to a new customers table in the newyork cluster, you could use this command:
    maprcli table replica autosetup -path /mapr/sanfrancisco/customers -replica /mapr/newyork/customers

    To set up replication from the customersA table in the sanfrancisco cluster to a new customersB table in the same cluster, you could use this command:
    maprcli table replica autosetup -path /mapr/sanfrancisco/customersA -replica /mapr/sanfrancisco/customersB

    This command takes two optional parameters:

    -columns
         The value is a comma-separated list of items with the following syntax:
         <column family>
         <column family>:<column>
         For example, to replicate only the column family purchases and the column stars in the reviews column family, the value would look like this:
    -columns purchases,reviews:stars

    -synchronous
         This parameter specifies whether replication is synchronous or asynchronous. Asynchronous is the default. The values are yes for synchronous and no for asynchronous.

    • For manual setup in one direction, follow these steps:
    1. Create the replica manually with the maprcli table create command. Use the -copyMetaFrom option to ensure that the metadata for the replica is identical to the metadata for the source table. Metadata includes column families, access control expressions (ACEs), and other attributes.
      maprcli table create -path <path to the replica> -copyMetaFrom <path to the source table>

      For example, to create the replica customers in the newyork cluster and use the metadata from the source table in the sanfrancisco cluster, you could use this command:
      maprcli table create -path /mapr/newyork/customers -copymetafrom /mapr/sanfrancisco/customers
    2. Register the replica as a replica of the source table by running the maprcli table replica add command.
      maprcli table replica add -path <path to the source table> -replica <path to the replica> -paused true

      For example, to register the customers table in the newyork cluster as a replica of the customers table in the sanfrancisco cluster, you could use this command:
      maprcli table replica add -path /mapr/sanfrancisco/customers -replica /mapr/newyork/customers -paused true

      The -paused parameter ensures that replication does not start immediately after you register the source table as a source for this replica. You do this registration in step d.
    3. Verify that you specified the correct replica by running the maprcli table replica list command.
      maprcli table replica list -path <path to the source table>

      To verify that the customers table in the newyork cluster is a replica of the customers table in the sanfrancisco cluster, you could look at the output of this command:
      maprcli table replica list -path /mapr/sanfrancisco/customers
    4. Authorize replication between the tables by defining the source table as the upstream table for the replica by running the maprcli table upstream add command.
      Definition of the upstream table ensures that a table cannot replicate updates to any replica. Replication depends on a two-way agreement between the owners of the two tables.
      maprcli table upstream add -path <path to the replica> -upstream <path to the source table>

      To add the customers table in the sanfrancisco cluster as an upstream source for the customers table in the newyork cluster:
      maprcli table upstream add -path /mapr/newyork/customers -upstream /mapr/sanfrancisco/customers
    5. Verfiy that you specified the correct source table by running the maprcli table upstream list command.
      maprcli table upstream list -path <path to the replica>

      To verify this in our example scenario, you could use this command:
      maprcli table upstream list -path /mapr/newyork/customers
    6. If you set -paused to true when adding the replica, follow these steps:
      1. Load the replica with data from the source table by using the MapR-DB CopyTable utility.
      2. Start replication with the command maprcli table replica resume.
        Here is the maprcli command:
        maprcli table replica resume -path <path to the source table> -replica <path to the replica>

        For our example scenario, you could use this command:
        maprcli table replica resume -path mapr/sanfrancisco/customers -replica /mapr/newyork/customers
  3. For replication in the other direction, follow these steps:
    1. Log into both the source and destination clusters.
       
    2. Register the replica as a replica of the source table by running the maprcli table replica add command.
      maprcli table replica add -path <path to the source table> -replica <path to the replica>
       
    3. Verify that you specified the correct replica by running the maprcli table replica list command.
      maprcli table replica list -path <path to the source table>
       
    4. Authorize replication between the tables by defining the source table as the upstream table for the replica by running the maprcli table upstream add command.

      Definition of the upstream table ensures that a table cannot replicate updates to any replica. Replication depends on a two-way agreement between the owners of the two tables.

      maprcli table upstream add -path <path to the replica> -upstream <path to the source table>

Results

Replication between the two tables is now set up and active.

What to do next

If one of the tables goes offline, direct client applications to the other table. When the offline table comes back online, replication between the two tables continues automatically. When both tables are again back in synch, you can redirect client applications back to the original table.

For example, suppose that client applications are using the customers table that is in the cluster sanfrancisco.

The customers table in the sanfrancisco cluster becomes unavailable, so you redirect those client applications to the customers table in the newyork cluster.

After the customers table in the sanfrancisco cluster comes back online, replication back to it starts immediately. Because client applications are not yet using this table, there are no updates to replicate to the table in the newyork cluster.

When the customers table in the sanfrancisco cluster is in synch with the other table, you can redirect your client applications back to it.

Be aware that changes to the structure of a source table are not replicated automatically to replicas. For example, if a new column family is added to the source table and the entire table is being replicated (i.e. the maprcli table replica add command did not specify column families or columns to replicate), the new column family is not automatically created at the replica.

You can add the new column family to the replica only if the entire source table is being replicated, then updates to the new column family will immediately start being replicated. You do not need to carry out the next steps. Continue only if you are replicating a subset of column families and columns.

If you are replicating a subset of column families and columns, follow these steps to add a new column family to the replica:

  1. Pause replication by running the maprcli table replica pause command.
  2. Add the new column family to the replica by running the maprcli table replica edit command.
  3. Copy the data for this column family from the source table into the replica by using the MapR-DB CopyTable utility. Use the -columns parameter to specify the name of the column family.
  4. Resume replication by running the maprcli table replica resume command.

Check for alarms related to replication and whether synchronous replication is switched temporarily to asynchronous replication by looking in MCS. See Table-Replication Alarms.