MapR 5.0 Documentation : Setting Up Compression with HBase

Using compression with HBase reduces the number of bytes transmitted over the network and stored on disk. These benefits often outweigh the performance cost of compressing the data on every write and uncompressing it on every read.

GZip Compression

GZip compression is included with most Linux distributions, and works natively with HBase. To use GZip compression, specify it in the per-column family compression flag while creating tables in HBase shell. Example:

 create 'mytable', {NAME=>'colfam:', COMPRESSION=>'gz'}

LZO Compression

Lempel-Ziv-Oberhumer (LZO) is a lossless data compression algorithm, included in most Linux distributions, that is designed for decompression speed.

Setting up LZO compression for use with HBase:

  1. Make sure HBase is installed on the nodes where you plan to run it. See Planning the Cluster and Installing MapR Software for more information.
  2. On each HBase node, ensure the native LZO base library is installed:
    • On Ubuntu: apt-get install liblzo2-dev liblzo2
    • On Red Hat or CentOS: yum install lzo-devel lzo
  3. Check out the native connector library from http://svn.codespot.com/a/apache-extras.org/hadoop-gpl-compression/
    • For 0.20.2 check out branches/branch-0.1

      svn checkout http://svn.codespot.com/a/apache-extras.org/hadoop-gpl-compression/branches/branch-0.1/
  4. Set the compiler flags and build the native connector library:

    $ export CFLAGS="-m64"
    $ ant compile-native
    $ ant jar
    
  5. Create a directory for the native libraries (use TAB completion to fill in the <version> placeholder):

    mkdir -p /opt/mapr/hbase/hbase-<version>/lib/native/Linux-amd64-64/
  6. Copy the build results into the appropriate HBase directories on every HBase node. Example:

    $ cp build/native/Linux-amd64-64/lib/libgplcompression.* /opt/mapr/hbase/hbase-<version>/lib/native/Linux-amd64-64/
    
  7. Download the hadoop-lzo compression library from https://github.com/twitter/hadoop-lzo.
  8. Create a symbolic link under /opt/mapr/hbase/hbase-<version>/lib/native/Linux-amd64-64/ to point to
    • On Ubuntu:

      ln -s /usr/lib/x86_64-linux-gnu/liblzo2.so.2 /opt/mapr/hbase/hbase-<version>/lib/native/Linux-amd64-64/
    • On Red Hat or CentOS:

      ln -s /usr/lib64/liblzo2.so.2 /opt/mapr/hbase/hbase-<version>/lib/native/Linux-amd64-64/liblzo2.so.2
  9. Restart the RegionServer:

    maprcli node services -hbregionserver
    restart -nodes <hostname>
    

Once LZO is set up, you can specify it in the per-column family compression flag while creating tables in HBase shell. Example:

 create 'mytable', {NAME=>'colfam:', COMPRESSION=>'lzo'}

Snappy Compression

The Snappy compression algorithm is optimized for speed over compression. Snappy compression is included in the core MapR installation and no additional configuration is required.