MapR 5.0 Documentation : Hive and HCatalog Integration

The HCatalog library provides applications with a table view of the MapR-FS layer in your cluster, expanding your application's options from read/write data streams to add table operations such as get row and store row. The HCatalog library stores the metadata required for its operations in the Hive Metastore.

The hcat utility can execute any of the data definition language (DDL) commands available in Hive that do not involve launching a MapReduce job. Internally, the hcat utility passes DDL commands to the hive program. Data stored in the MapR filesystem is serialized and deserialized through InputStorageFormats and OutputStorageFormats objects for records. Fields within a record are parsed with SerDes.

The hive-json-serde-0.2.jar JSON serializer/deserializer has not implemented a serialize() method and as a result does not function.

The WebHCat server provides a REST-like web API for HCatalog. For more information about using WebHCat, see Hive and WebHCat Integration.

This page contains the following topics:

Accessing HCatalog Tables from Hive

To access tables created in HCatalog in Hive, use the following command to append paths to your HADOOP_CLASSPATH environment variable:

export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$HCAT_HOME/share/hcatalog/storage-handlers/hbase/lib/hbase-storage-handler-<version>.jar:$HCAT_HOME/share/hcatalog/hcatalog-core-<version>-mapr.jar:$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-<version>-mapr.jar:$HCAT_HOME/share/hcatalog/hcatalog-server-extensions-<version>-mapr.jar

Loading and Retrieving Data from Pig

To use the HCatalog library HCatLoader and HCatStorer to load and retrieve data from Pig:

  1. Create a table with the hcat utility.

    hcat -e "create table hcatpig(key int, value string)"
    
  2. Verify that the table and table definition both exist.

    hcat -e "describe formatted hcatpig"
    
  3. Load data into the table from Pig: Copy the $HIVE_HOME/examples/files/kv1.txt file into the MapRFS file system, then start Pig and load the file with the following commands:

    pig -useHCatalog -Dmapred.job.tracker=maprfs:/// -Dfs.default.name=maprfs://CLDB_Host:7222/
    grunt> A = LOAD 'kv1.txt' using PigStorage('\u0001') AS(key:INT, value:chararray);
    grunt> STORE A INTO 'hcatpig' USING org.apache.hive.hcatalog.pig.HCatStorer();
    
  4. Retrieve data from the hcatpig table with the following Pig commands:

    B = LOAD 'default.hcatpig' USING org.apache.hive.hcatalog.pig.HCatLoader();
    dump B; // this should display the records in kv1.txt
    

    Another way to verify that the data is loaded into the hcatpig table is by looking at the contents of maprfs://user/hive/warehouse/hcatpig/. HCatalog tables are also accessible from the Hive CLI. All Hive queries work on HCatalog tables.

Running MapReduce Applications

This example uses a sample MapReduce program named HCatalogMRTest.java. The program is attached to this page and can be downloaded by clicking Tools > Attachments.

  1. From the command line, issue the following commands to define the environment:

    export LIB_JARS=
    $HCAT_HOME/share/hcatalog/hcatalog-core-<version>-mapr.jar,
    $HIVE_HOME/lib/hive-metastore-<version>-mapr.jar,
    $HIVE_HOME/lib/libthrift-<version>.jar,
    $HIVE_HOME/lib/hive-exec-<version>-mapr.jar,
    $HIVE_HOME/lib/libfb303-<version>.jar,
    $HIVE_HOME/lib/jdo2-api-<version>-ec.jar,
    $HIVE_HOME/lib/slf4j-api-<version>.jar
    
    export HADOOP_CLASSPATH=
    $HCAT_HOME/share/hcatalog/hcatalog-core-<version>-mapr.jar:
    $HIVE_HOME/lib/hive-metastore-<version>-mapr.jar:
    $HIVE_HOME/lib/libthrift-<version>.jar:
    $HIVE_HOME/lib/hive-exec-<version>-mapr.jar:
    $HIVE_HOME/lib/libfb303-<version>.jar:
    $HIVE_HOME/lib/jdo2-api-<version>-ec.jar:
    $HIVE_HOME/conf:
    $HADOOP_HOME/conf:
    $HIVE_HOME/lib/slf4j-api-<version>.jar
    
  2. Compile HCatalogMRTest.java:

    javac -cp `hadoop classpath`:${HCAT_HOME}/share/hcatalog/hcatalog-core-<version>-mapr.jar HCatalogMRTest.java -d .
    
  3. Create a JAR file:

    jar -cf hcatmrtest.jar org
    
  4. Create an output table:

    hcat -e "create table hcatpigoutput(key int, value int)"
    
  5. Run the job:

    hadoop --config $HADOOP_HOME/conf jar ./hcatmrtest.jar org.myorg.HCatalogMRTest -libjars $LIB_JARS hcatpig hcatpigoutput
    

    At the end of the job, the file hcatpigoutput should have entries in the form key, count.

Running Non-MapReduce Applications

This example uses a sample MapReduce program named TestReaderWriter.java. The program is attached to this page and can be downloaded by clicking Tools > Attachments.

  1. Add the following JAR files to your $HADOOP_CLASSPATH environment variable with the following command:

    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/mapr/hive/hive-<version>/lib/antlr-runtime-3.4.jar:/opt/mapr/hive/hive-<version>/lib/hive-cli-<veresion>-mapr.jar	 	 	
    
  2. Compile the test program with the following command:

    javac -cp `hadoop classpath`:${HCAT_HOME}/share/hcatalog/hcatalog-core-<version>-mapr.jar TestReaderWriter.java -d <directory>
    
  3. Create a JAR file with the following command:

    jar -cf hcatrwtest.jar org
    
  4. Run the job with the following command:

    hadoop jar /root/<username>/hcatalog/hcatrwtest.jar org.apache.hive.catalog.data.TestReaderWriter -libjars $LIB_JARS
    

The last command should result in a table named mytbl that is populated with data.

Attachments:

TestReaderWriter.java (text/plain)
hcat-product.jpg (image/jpeg)
HCatalogMRTest.java (text/plain)
TestReaderWriter.java (text/plain)
HCatalogMRTest.java (text/plain)