MapR-DB tables can be parsed by the Apache CopyTable tool (
org.apache.hadoop.hbase.mapreduce.CopyTable). You can use the CopyTable tool to migrate data from an Apache HBase table to a MapR-DB tables or from a MapR-DB tables to an Apache HBase table.
Before You Start
Before migrating your tables to another platform, consider the following points:
- Schema Changes. Apache HBase and MapR-DB tables have different limits on the number of column families. If you are migrating to MapR, you may be interested in changing your table's schema to take advantage of the increased availability of column families. Conversely, if you're migrating from MapR-DB tables to Apache, you may need to adjust your schema to reflect the reduced availability of column families.
- API Mappings: If you are migrating from Apache HBase to MapR-DB tables, examine your current HBase applications to verify the APIs and HBase shell commands used are fully supported.
- Namespace Mapping: If the migration will take place over a period of time, be sure to plan your table namespace mappings in advance to ease the transition.
- Implementation Limitations: MapR-DB tables do not support HBase coprocessors. If your existing Apache HBase installation uses coprocessors, plan any necessary modifications in advance. MapR-DB tables support a subset of the regular expressions supported in Apache HBase. Check your existing workflow and HBase applications to verify you are not using unsupported regular expressions.
If you are migrating to MapR-DB tables, be sure to change your Apache HBase client to the MapR client by installing the version of the
mapr-hbase package that matches the version of Apache HBase on your source cluster.
See Installing MapR Software for information about MapR installation procedures, including setting up the proper repositories.
MapR-DB tables support the LZ4, LZF, and ZLIB compression algorithms.
When you create a MapR-DB tables with the Apache HBase API or the HBase shell and specify the LZ4, LZO, or SNAPPY compression algorithms, the resulting MapR-DB tables uses the LZ4 compression algorithm.
When you describe an MapR-DB tables schema through the HBase API, the LZ4 and OLDLZF compression algorithms map to the LZ4 compression algorithm.
The Apache CopyTable tool launches a MapReduce job. The nodes on your cluster must have the correct version of the
mapr-hbase package installed. To ensure that your existing HBase applications and workflow work properly, install the mapr-hbase package that provides the same version number of HBase as your existing Apache HBase.
Launch the CopyTable tool with the following command, specifying the full destination path of the table with the
Example: Migrating an Apache HBase table to a MapR-DB table
This example migrates the existing Apache HBase table
mytable01 to the MapR-DB table
On the node in the MapR cluster where you will launch the CopyTable tool, modify the value of the
hbase.zookeeper.quorum property in the
hbase-site.xml file to point at a ZooKeeper node in the source cluster. Alternately, you can specify the value for the
hbase.zookeeper.quorum property from the command line. This example specifies the value in the command line.
Exit the HBase shell.
From the HBase command line, use the CopyTable tool to migrate data.
After copying data to the new tables, verify that the migration is complete and successful. In increasing order of complexity:
Verify that the destination table exists. From the HBase shell, use the
listcommand, or use the
ls /user/john/foocommand from a Linux prompt:
Check the number of rows in the source table against the destination table with the
- Hash each table, then compare the hashes.
Decommissioning the Source
After verifying a successful migration, you can decommission the source nodes where the tables were originally stored.
Decommissioning a MapR Node
Before you start, drain the node of data by moving the node to the
/decommissioned physical topology. All the data on a node in the
/decommissioned topology is migrated to volumes and nodes in the
Run the following command to check if a given volume is present on the node:
Run this command for each non-local volume in your cluster to verify that the node being decommissioned is not storing any volume data.
- Change to the root user (or use sudo for the following commands).
- Stop the Warden:
service mapr-warden stop
- If ZooKeeper is installed on the node, stop it:
service mapr-zookeeper stop
- Determine which MapR packages are installed on the node:
dpkg --list | grep mapr(Ubuntu)
rpm -qa | grep mapr(Red Hat or CentOS)
- Remove the packages by issuing the appropriate command for the operating system, followed by the list of services. Examples:
apt-get purge mapr-core mapr-cldb mapr-fileserver(Ubuntu)
yum erase mapr-core mapr-cldb mapr-fileserver(Red Hat or CentOS)
- Remove the
/opt/maprdirectory to remove any instances of
zookeeperleft behind by the package manager.
- Remove any MapR cores in the
- If the node you have decommissioned is a CLDB node or a ZooKeeper node, then run
configure.shon all other nodes in the cluster (see Configuring the Node).
Decommissioning Apache HBase Nodes
To decommission nodes running Apache HBase, follow these steps for each node:
From the HBase shell, disable the Region Load Balancer by setting the value of
- Leave the HBase shell by typing
Run the graceful stop script to stop the HBase RegionServer:
graceful_stop.shscript does not look up the hostname for an IP number. Do not pass an IP number to the script. Check the list of RegionServers in the Apache HBase Master UI to determine the hostname for the node being decommissioned.