MapR 5.0 Documentation : Migrating Between Apache HBase Tables and MapR-DB Tables

MapR-DB tables can be parsed by the Apache CopyTable tool (org.apache.hadoop.hbase.mapreduce.CopyTable). You can use the CopyTable tool to migrate data from an Apache HBase table to a MapR-DB tables or from a MapR-DB tables to an Apache HBase table.

Before You Start

Before migrating your tables to another platform, consider the following points:

  • Schema Changes. Apache HBase and MapR-DB tables have different limits on the number of column families. If you are migrating to MapR, you may be interested in changing your table's schema to take advantage of the increased availability of column families. Conversely, if you're migrating from MapR-DB tables to Apache, you may need to adjust your schema to reflect the reduced availability of column families.
  • API Mappings: If you are migrating from Apache HBase to MapR-DB tables, examine your current HBase applications to verify the APIs and HBase shell commands used are fully supported.
  • Namespace Mapping: If the migration will take place over a period of time, be sure to plan your table namespace mappings in advance to ease the transition.
  • Implementation Limitations: MapR-DB tables do not support HBase coprocessors. If your existing Apache HBase installation uses coprocessors, plan any necessary modifications in advance. MapR-DB tables support a subset of the regular expressions supported in Apache HBase. Check your existing workflow and HBase applications to verify you are not using unsupported regular expressions.

If you are migrating to MapR-DB tables, be sure to change your Apache HBase client to the MapR client by installing the version of the mapr-hbase package that matches the version of Apache HBase on your source cluster.

See Installing MapR Software for information about MapR installation procedures, including setting up the proper repositories.

Compression Mappings

MapR-DB tables support the LZ4, LZF, and ZLIB compression algorithms.

When you create a MapR-DB tables with the Apache HBase API or the HBase shell and specify the LZ4, LZO, or SNAPPY compression algorithms, the resulting MapR-DB tables uses the LZ4 compression algorithm.

When you describe an MapR-DB tables schema through the HBase API, the LZ4 and OLDLZF compression algorithms map to the LZ4 compression algorithm.

Copying Data

The Apache CopyTable tool launches a MapReduce job. The nodes on your cluster must have the correct version of the mapr-hbase package installed. To ensure that your existing HBase applications and workflow work properly, install the mapr-hbase package that provides the same version number of HBase as your existing Apache HBase.

  1. Launch the CopyTable tool with the following command, specifying the full destination path of the table with the parameter:

    hbase org.apache.hadoop.hbase.mapreduce.CopyTable
    -Dhbase.zookeeper.quorum=<ZooKeeper IP Address>

Example: Migrating an Apache HBase table to a MapR-DB table

This example migrates the existing Apache HBase table mytable01 to the MapR-DB table /user/john/foo/mytable01.

On the node in the MapR cluster where you will launch the CopyTable tool, modify the value of the hbase.zookeeper.quorum property in the hbase-site.xml file to point at a ZooKeeper node in the source cluster. Alternately, you can specify the value for the hbase.zookeeper.quorum property from the command line. This example specifies the value in the command line.

  1. Create the destination table. This example uses the HBase shell. The CLI and MapR Control System (MCS) are also viable methods.

    [user@host]$ hbase shell
    HBase Shell; enter 'help<RETURN>' for list of supported commands.
    Type "exit<RETURN>" to leave the HBase Shell
    hbase(main):001:0> create '/user/john/foo/mytable01', 'usernames', 'userpath'
    0 row(s) in 0.2040 seconds
  2. Exit the HBase shell.

    hbase(main):002:0> exit
  3. From the HBase command line, use the CopyTable tool to migrate data.

    [user@host] hbase org.apache.hadoop.hbase.mapreduce.CopyTable --Dhbase.zookeeper.quorum=zknode1,zknode2,zknode3 mytable01

Verifying Migration

After copying data to the new tables, verify that the migration is complete and successful. In increasing order of complexity:

  1. Verify that the destination table exists. From the HBase shell, use the list command, or use the ls /user/john/foo command from a Linux prompt:

    hbase(main):006:0> list '/user/john/foo'
    1 row(s) in 0.0770 seconds
  2. Check the number of rows in the source table against the destination table with the count command:

    hbase(main):005:0> count '/user/john/foo/mytable01'
    30 row(s) in 0.1240 seconds
  3. Hash each table, then compare the hashes.

Decommissioning the Source

After verifying a successful migration, you can decommission the source nodes where the tables were originally stored.

Decommissioning a MapR Node

Before you start, drain the node of data by moving the node to the /decommissioned physical topology. All the data on a node in the /decommissioned topology is migrated to volumes and nodes in the /data topology.

Run the following command to check if a given volume is present on the node:

maprcli dump volumenodes -volumename <volume> -json | grep <ip:port>

Run this command for each non-local volume in your cluster to verify that the node being decommissioned is not storing any volume data.

  1. Change to the root user (or use sudo for the following commands).
  2. Stop the Warden:
    service mapr-warden stop
  3. If ZooKeeper is installed on the node, stop it:
    service mapr-zookeeper stop
  4. Determine which MapR packages are installed on the node:
    • dpkg --list | grep mapr (Ubuntu)
    • rpm -qa | grep mapr (Red Hat or CentOS)
  5. Remove the packages by issuing the appropriate command for the operating system, followed by the list of services. Examples:
    • apt-get purge mapr-core mapr-cldb mapr-fileserver (Ubuntu)
    • yum erase mapr-core mapr-cldb mapr-fileserver (Red Hat or CentOS)
  6. Remove the /opt/mapr directory to remove any instances of hostid, hostname, zkdata, and zookeeper left behind by the package manager.
  7. Remove any MapR cores in the /opt/cores directory.
  8. If the node you have decommissioned is a CLDB node or a ZooKeeper node, then run on all other nodes in the cluster (see Configuring the Node).

Decommissioning Apache HBase Nodes

To decommission nodes running Apache HBase, follow these steps for each node:

  1. From the HBase shell, disable the Region Load Balancer by setting the value of balance_switch to false:

    hbase(main):001:0> balance_switch false
  2. Leave the HBase shell by typing exit.
  3. Run the graceful stop script to stop the HBase RegionServer:

    [user@host] ./bin/ <hostname>

    The script does not look up the hostname for an IP number. Do not pass an IP number to the script. Check the list of RegionServers in the Apache HBase Master UI to determine the hostname for the node being decommissioned.