Setting Up Multi-Master Replication Automatically

You can run a command to instruct MapR-DB to set up multi-master replication from an existing source table.

  • Configure one or more gateways in the destination cluster. See Configuring and Managing MapR Gateways.
  • If the source and destination clusters are secured, set up security for replication between the clusters. See Configuring MapR Clusters for Replication Between Tables.
  • Run the maprcli table info command on the source table to verify that you have the following permissions:
    • readperm, which is required for reading from the table.
    • replperm, which is required for replicating from the table.
  • Run the maprcli table info command on the destination table (if it already exists) to verify that you have the following permissions:
    • bulkload, which is required for the initial copy of source data into the destination table.
    • replperm, which is required for receiving replicated updates from the source table.

To set up multi-master replication automatically:

Note: Although this procedure describes how to use the maprcli, you can set up this replication topology in the MapR Control Service (MCS). Log into MCS and select MapR Tables in the navigation menu. Select a table to be the source table and click the Replicas tab. The actions for setting up replication are in this location.

  1. Log into both the source and destination clusters.
  2. Run the maprcli command table replica autosetup command, which performs these steps for you:
    1. Creates a table on the destination cluster. This table has the same column families as the source table.
    2. Declares the new table to be a replica of the source table.
    3. Declares the source table as an upstream source for the replica.
    4. Loads a copy of the source data into the replica.
    5. Declares the source table to be a replica of the new table.
    6. Declares the new table to be an upstream source for the source table.
    7. Starts replication.
    Here is the syntax of the command:
    maprcli table replica autosetup -path <path to source table> -replica <path to replica> -multimaster true

    The parameter -multimaster is an optional parameter that you use to set up multimaster replication.

    For example, to set up replication between the customers table in the sanfrancisco cluster and a new customers table in the newyork cluster, you could use this command:

    maprcli table replica autosetup -path /mapr/sanfrancisco/customers -replica /mapr/newyork/customers -multimaster true
    To set up replication between the customersA table in the sanfrancisco cluster and a new customersB table in the same cluster, you could use this command:
    maprcli table replica autosetup -path /mapr/sanfrancisco/customersA -replica /mapr/sanfrancisco/customersB -multimaster true
    This command takes two other optional parameters:
    -columns
    For binary tables
    Provide a comma-separated list of column families or columns from a certain column family (column family:qualifier). For example, use the following syntax to replicate the column family purchases and the column stars in the reviews column family: -columns purchases,reviews:stars
    Note: While the column families that you specify must already exist in the source table, the columns that you specify do not have to exist in the destination table for replication to succeed. If the column is added at a later date, replication for that column will start at that time.
    For JSON tables
    Provide a comma-delimited list of fields to replicate. Include the full field path for each field.

    Example

    Suppose your table contains documents that contain this general structure:

    {
         "_id" : "ID",
         "a" :
              {
                   "b" : 
                        {
                             "c" : "value",
                        },
                   "e" : "value"
              }
    }

    To replicate fields a, c, and e, you would specify these field paths:

    a,a.b.c,a.e
    Do not use quotation marks and do not include spaces after commas.

    Suppose now that a.b and a.e were custom column families. You want to replicate only the default column family and column family a.e. To do so, you would specify field paths like this:

    ",a.e"

    The empty string before the comma indicates the default column family.

    -synchronous
    This parameter specifies whether replication is synchronous or asynchronous. Asynchronous is the default. The values are yes for synchronous and no for asynchronous.

    See table replica autosetupfor the full syntax of this command.

If one of the tables goes offline, direct client applications to the other table. When the offline table comes back online, replication between the two tables continues automatically. When both tables are again back in synch, you can redirect client applications back to the original table.

For example, suppose that client applications are using the customers table that is in the cluster sanfrancisco.

The customers table in the sanfrancisco cluster becomes unavailable, so you redirect those client applications to the customers table in the newyork cluster.

After the customers table in the sanfrancisco cluster comes back online, replication back to it starts immediately. Because client applications are not yet using this table, there are no updates to replicate to the table in the newyork cluster.

When the customers table in the sanfrancisco cluster is in synch with the other table, you can redirect your client applications back to it.

Be aware that changes to the structure of a source table are not replicated automatically to replicas. For example, if a new column family is added to the source table and the entire table is being replicated (i.e. the maprcli table replica add command did not specify column families or columns to replicate), the new column family is not automatically created at the replica.

You can add the new column family to the replica only if the entire source table is being replicated, then updates to the new column family will immediately start being replicated. You do not need to carry out the next steps. Continue only if you are replicating a subset of column families and columns.

If you are replicating a subset of column families and columns, follow these steps to add a new column family to the replica:

  1. Pause replication by running the maprcli table replica pause command.
  2. Add the new column family to the replica by running the maprcli table replica edit command.
  3. Copy the data for this column family from the source table into the replica by using the CopyTable utility. Use the -columns parameter to specify the name of the column family.
  4. Resume replication by running the maprcli table replica resume command.

Check for alarms related to replication and whether synchronous replication is switched temporarily to asynchronous replication by looking in MCS. See Table-Replication Alarms.