MapR 4.0.x Documentation : Hive

Apache Hive is a data warehouse system for Hadoop that uses a SQL-like language called Hive Query Language (HQL) to query structured data stored in a distributed filesystem. For more information about Hive, see the Apache Hive project page.

On this page:

See also Upgrading Hive and Working with Hive.

Installing Hive, HiveServer2, and Hive Metastore

The following procedures use the operating system package managers to download and install Hive from the MapR Repository. If you want to install this component manually from packages files, see Packages and Dependencies for MapR Software.

Hive is distributed as three packages:

  • mapr-hive - contains the following components:
    • The core Hive package.
    • HiveServer2 - allows multiple concurrent connections to the Hive server over a network.
    • Hive Metastore - stores the metadata for Hive tables and partitions in a relational database.
  • mapr-hiveserver2 - allows HiveServer2 to be managed by the warden, allowing you to start and stop HiveServer2 using maprcli or the MapR Control System. The mapr-hive package is a dependency and will be installed if you install mapr-hiveserver2. At installation time, Hiveserver2 is started automatically.

  • mapr-hivemetastore - allows Hive Metastore to be managed by the warden, allowing you to start and stop Hive Metastore using maprcli or the MapR Control System. The mapr-hive package is a dependency and will be installed if you install mapr-hivemetastore. At installation time, the Hive Metastore is started automatically.

This procedure is to be performed on a MapR cluster (see the Advanced Installation Topics) or client (see Setting Up the Client).

Make sure the environment variable JAVA_HOME is set correctly. Example:

# export JAVA_HOME=/usr/lib/jvm/java-6-sun

Make sure the environment variable HIVE_HOME is set correctly. Example:

# export HIVE_HOME=/opt/mapr/hive/hive-<version>

You can set these system variables by using the shell command line or by updating files such as /etc/profile or ~/.bash_profile. See the Linux documentation for more details about setting system environment variables.

After Hive is installed, the executable is located at: /opt/mapr/hive/hive-<version>/bin/hive

To install Hive on an Ubuntu cluster:

  1. Execute the following commands as root or using sudo.
  2. Update the list of available packages:

    apt-get update
  3. On each planned Hive node, install Hive.
    • To install only Hive:
      apt-get install mapr-hive
       
    • To install Hive and HiveServer2:

      apt-get install mapr-hiveserver2 
       
    • To Install Hive and Hive Metastore:
      apt-get install mapr-hivemetastore
       
    • To install Hive, Hive Metastore, and HiveServer2:

      apt-get install mapr-hivemetastore mapr-hiveserver2 
  4. Run configure.sh:
    /opt/mapr/server/configure.sh -R

This procedure installs Hive 0.13.0. To install an earlier version, specify it in the package names. Make sure to install the same version of all packages. Example:

apt-get install mapr-hive=0.13.xxx

You can determine the available versions with the apt-cache madison mapr-hive command. See the Hive Release Notes  for a list of fixes and new features.

To install Hive on a Red Hat or CentOS cluster:

  1. Execute the following commands as root or using sudo.
  2. On each planned Hive node, install Hive.
    • To install only Hive:
      yum install mapr-hive
    • To install Hive and HiveServer2:

      yum install mapr-hiveserver2 
    • To Install Hive and Hive Metastore:
      yum install mapr-hivemetastore
    • To install Hive, Hive Metastore, and HiveServer2:

      yum install mapr-hivemetastore mapr-hiveserver2 
  3. Run configure.sh:
    /opt/mapr/server/configure.sh -R

     

This procedure installs Hive 0.13.0. To install an earlier version, specify it in the package names. Make sure to install the same version of all packages.

Example
yum install mapr-hive-0.13.xxx

See the Hive Release Notes for a list of fixes and new features.


Getting Started with Hive

In this tutorial, you'll create a Hive table, load data from a tab-delimited text file, and run a couple of basic queries against the table.

If you are using HiveServer2, you will use the BeeLine CLI instead of the Hive shell, as shown below. For details on setting up HiveServer2 and starting BeeLine, see Using HiveServer2.

First, make sure you have downloaded the sample table. Select Tools > Attachments and right-click on sample-table.txt, select Save Link As... from the pop-up menu, select a directory to save to, then click OK. If you're working on the MapR Virtual Machine, we'll be loading the file from the MapR Virtual Machine's local file system (not the cluster storage layer), so save the file in the MapR Home directory (for example, /home/mapr).

Take a look at the source data

First, take a look at the contents of the file using the terminal:

  1. Make sure you are in the Home directory where you saved sample-table.txt (type cd ~ if you are not sure).
  2. Type cat sample-table.txt to display the following output.
mapr@mapr-desktop:~$ cat sample-table.txt
1320352532 1001 http://www.mapr.com/doc http://www.mapr.com 192.168.10.1
1320352533 1002 http://www.mapr.com http://www.example.com 192.168.10.10
1320352546 1001 http://www.mapr.com http://www.mapr.com/doc 192.168.10.1

Notice that the file consists of only three lines, each of which contains a row of data fields separated by the TAB character. The data in the file represents a web log.

Create a table in Hive and load the source data:

  1. Type the following command to start the Hive shell, using tab-completion to expand the <version>:

    /opt/mapr/hive/hive-<version>/bin/hive
  2. At the hive> prompt, type the following command to create the table:

    CREATE TABLE web_log(viewTime INT, userid BIGINT, url STRING, referrer STRING, ip STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
  3. Type the following command to load the data from sample-table.txt into the table:

    LOAD DATA LOCAL INPATH '/home/mapr/sample-table.txt' INTO TABLE web_log;

Run basic queries against the table:

  • Try the simplest query, one that displays all the data in the table:

    SELECT web_log.* FROM web_log;

    This query would be inadvisable with a large table, but with the small sample table it returns very quickly.

  • Try a simple SELECT to extract only data that matches a desired string:

    SELECT web_log.* FROM web_log WHERE web_log.url LIKE '%doc';

    This query launches a MapReduce job to filter the data.

Starting Hive

You can start the Hive shell from HIVE_HOME/bin/ with the hive command. Example:

/opt/mapr/hive/hive-<version>/bin/hive

When the Hive shell starts, it reads an initialization file called .hiverc which is located in the HIVE_HOME/bin/ or $HOME/ directories. You can edit this file to set custom parameters or commands that initialize the Hive command-line environment, one command per line.

When you run the Hive shell, you can specify a MySQL initialization script file using the -i option. Example:

/opt/mapr/hive/hive-<version>/bin/hive -i <filename>

Managing Hive Metastore

As of MapR version 3.0.2, the Hive Metastore is started automatically by the warden at installation time if the mapr-hivemetastore package is installed. It is sometimes necessary to start or stop the service (for example, after changing the configuration). You can start and stop Hive Metastore in two ways:

  • Using the maprcli node services command - Using this command, you can start Hive Metastore on multiple nodes at one time.
  • Using the MapR Control System

To start Hive Metastore using the maprcli:

  1. Make a list of nodes on which Hive Metastore is configured.
  2. Issue the maprcli node services command, specifying the nodes on which Hive Metastore is configured, separated by spaces. Example:

     maprcli node services -name hivemeta -action start -nodes node001 node002 node003

To stop Hive Metastore using the maprcli:

  1. Make a list of nodes on which Hive Metastore is configured.
  2. Issue the maprcli node services command, specifying the nodes on which Hive Metastore is configured, separated by spaces. Example:

     maprcli node services -name hivemeta -action stop -nodes node001 node002 node003

To start or stop Hive Metastore using the MapR Control System:

  1. In the Navigation pane, expand the Cluster Views pane and click Dashboard.
  2. In the Services pane, click Hive Metastore to open the Nodes screen displaying all the nodes on which Hive Metastore is configured.
  3. On the Nodes screen, click the hostname of each node to display its Node Properties screen.
  4. On each Node Properties screen, use the Stop/Start button in the Hive Metastore row under Manage Services to start Hive Metastore.

Managing Hiveserver2

Hiveserver2 is started automatically at installation time by the warden if the mapr-hiveserver2 package is installed. It is sometimes necessary to start or stop the service (for example, after changing the configuration). You can start and stop Hiveserver2 in two ways:

  • Using the maprcli node services command - Using this command, you can start Hiveserver2 on multiple nodes at one time.
  • Using the MapR Control System

To start Hiveserver2 using the maprcli:

  1. Make a list of nodes on which Hiveserver2 is configured.
  2. Issue the maprcli node services command, specifying the nodes on which Hiveserver2 is configured, separated by spaces. Example:

     maprcli node services -name hs2 -action start -nodes node001 node002 node003

To stop Hiveserver2 using the maprcli:

  1. Make a list of nodes on which Hiveserver2 is configured.
  2. Issue the maprcli node services command, specifying the nodes on which Hiveserver2 is configured, separated by spaces. Example:

     maprcli node services -name hs2 -action stop -nodes node001 node002 node003

To start or stop Hiveserver2 using the MapR Control System:

  1. In the Navigation pane, expand the Cluster Views pane and click Dashboard.
  2. In the Services pane, click Hiveserver2 to open the Nodes screen displaying all the nodes on which Hiveserver2 is configured.
  3. On the Nodes screen, click the hostname of each node to display its Node Properties screen.
  4. On each Node Properties screen, use theStop/Start button in the Hiveserver2 row under Manage Services to start Hiveserver2.

Configuring Hive Directories

MapR configures a default location for the Hive scratch directory, the warehouse directory, and the error logs directory. 

The following sections provide details on each Hive directory and steps to configure a different directory location:

Hive Scratch Directory

By default, MapR configures the Hive scratch directory to be /user/<username>/tmp/hive. This default is defined in the $HIVE_HOME/conf/hive-default.xml.template file.

To modify this parameter, perform one of the following operations:

  • Set this parameter in the hive-site.xml.
    Copy the hive.exec.scratchdir property elements from the $HIVE_HOME/conf/hive-default.xml.template file and paste them into an XML configuration element in the $HIVE_HOME/conf/hive-site.xml file. Then, modify the value elements for these directories in the hive-site.xml file.  
  • Set this parameter from the Hive shell. Example:

    hive> set hive.exec.scratchdir=/myvolume/tmp
You will see better performance when queries import data from a table that is in the same MapR volume as Hive scratch directory.

How Hive Handles Scratch Directories on MapR

When a query requires Hive to query existing tables and create data for new tables, Hive uses the following workflow:

  1. Create the query scratch directory hive_<timestamp>_<randomnumber> under the Hive scratch directory. 

  2. Create the following directories as subdirectories of the scratch directory:

    1. Final query output directory. This directory's name takes the form -ext-<number>.

    2. An output directory for each MapReduce job. These directories' names take the form -mr-<number>.

  3. Hive executes the tasks, including MapReduce jobs and loading data to the query output directory.

  4. Hive loads the data from output directory into a table.
    By default, the table's directory is in the /user/hive/warehouse directory. You can configure this location with the hive.metastore.warehouse.dir parameter in hive-site.xml, unless the table DDL specifies a custom location. Hive renames the output directory to the table directory in order to load the output data to the table.

MapR uses volumes, which are logical units that enable you to apply policies to a set of files, directories, and sub-volumes. When the output directory and the table directory are in different volumes, this workflow involves moving data across volumes. This move is slower than moving data within a volume. In order to avoid moving data across a volume boundary, set the Hive scratch directory to be in the same volume as the target table.

To automatically create a scratch in the same volume as the target table, set the following property in hive-site.xml:

<property>
  <name>hive.optimize.insert.dest.volume</name>
  <value>true</value>
  <description>For CREATE TABLE AS and INSERT queries create the scratch directory under the destination directory. This avoids the data move across volumes and improves performance.</description>
</property>

These scratch directories are automatically deleted after the query completes successfully.

Hive Warehouse Directory

Hive tables are stored in the Hive warehouse directory. By default, MapR configures the Hive warehouse directory to be /user/hive/warehouse under the root volume. This default is defined in the $HIVE_HOME/conf/hive-default.xml.template file. 

To modify this parameter, perform one of the following operations:

  • Set this parameter in the hive-site.xml.
    Copy the hive.metastore.warehouse.dir property elements from the $HIVE_HOME/conf/hive-default.xml.template file and paste them into an XML configuration element in the $HIVE_HOME/conf/hive-site.xml file. Then, modify the value elements for these directories in the hive-site.xml file.  
  • Set this parameter from the Hive shell. Example:

    hive> set hive.metastore.warehouse.dir=/myvolume/mydirectory
You will see better performance when queries move data between tables in the same volume.

Hive Error Logs Directory

As of Hive 0.13-1501, the log files are stored in /opt/mapr/hive/hive-<version>/logs/<user> by default. In previous versions, the log files are stored in /tmp/<user> by default.

To modify the log location:

  1. Configure hive.log.dir in $HIVE_HOME/conf/hive-log4j.properties file. Example:

    hive.log.dir=<other_location> 
  2. Set the sticky bit on the new directory. Example:

    chmod 1777 <other_location>

Setting Up Hive with a MySQL Metastore

The metadata for Hive tables and partitions are stored in the Hive Metastore (for more information, see the Hive project documentation). By default, the Hive Metastore stores all Hive metadata in an embedded Apache Derby database in MapR-FS. Derby only allows one connection at a time; if you want multiple concurrent Hive sessions, you can use MySQL for the Hive Metastore. You can run the Hive Metastore on any machine that is accessible from Hive.

Prerequisites

  • Make sure MySQL is installed on the machine on which you want to run the Metastore, and make sure you are able to connect to the MySQL Server from the Hive machine. You can test this with the following command:

    mysql -h <hostname> -u <user>
  • The database administrator must create a database for the Hive metastore data, and the username specified in javax.jdo.option.ConnectionUserName must have permissions to access it. The database can be specified using the ConnectionURL parameter. The tables and schemas are created automatically when the metastore is first started.
  • The driver for the MySQL JDBC connector (a jar file) is part of the MapR distribution under /opt/mapr/lib/. Link this jar file into the Hive lib directory. For example:

    $ ln -s /opt/mapr/lib/mysql-connector-java-5.1.25-bin.jar
    /opt/mapr/hive/hive-<version>/lib/mysql-connector-java-5.1.25-bin.jar
     
    $ ls -l /opt/mapr/hive/hive-0.13/lib/mysql-connector-java-5.1.25-bin.jar
    lrwxrwxrwx 1 root root 49 Oct 21 14:48
    /opt/mapr/hive/hive-0.13/lib/mysql-connector-java-5.1.25-bin.jar ->
    /opt/mapr/lib/mysql-connector-java-5.1.25-bin.jar

    In the ln command, <version> is your installed Hive version. 

Configuring Hive for MySQL

Create the file hive-site.xml in the Hive configuration directory (/opt/mapr/hive/hive-<version>/conf) with the contents from the example below. Then set the parameters as follows:

  • You can set a specific port for Thrift URIs by adding the command export METASTORE_PORT=<port> into the file hive-env.sh (if hive-env.sh does not exist, create it in the Hive configuration directory). Example:

     export METASTORE_PORT=9083
  • To connect to an existing MySQL metastore, make sure the ConnectionURL parameter and the Thrift URIs parameters in hive-site.xml point to the metastore's host and port.
  • Once you have the configuration set up, start the Hive Metastore service using the following command (use tab auto-complete to fill in the <version>):

    /opt/mapr/hive/hive-<version>/bin/hive --service metastore

    You can use nohup hive --service metastore to run metastore in the background.

Example hive-site.xml

<configuration>

 <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
</property>

 <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
 </property>

 <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>username to use against metastore database</description>
 </property>

 <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value><fill in with password></value>
    <description>password to use against metastore database</description>
 </property>

 <property>
    <name>hive.metastore.uris</name>
    <value>thrift://localhost:9083</value>
 </property>

</configuration>

If you have not configured a MySQL Metastore, do not run the Hive shell from a MapR NFS mount location. If you try to do this, Hive will fail. The same problem will occur if you use the hive-site.xml file to configure the Metastore on a MapR NFS mount location. Avoid both of these configurations.

Hive-HBase Integration

You can create HBase tables from Hive that can be accessed by both Hive and HBase. This allows you to run Hive queries on HBase tables. You can also convert existing HBase tables into Hive-HBase tables and run Hive queries on those tables as well.

In this section:

Install and Configure Hive and HBase

1. Install and configure Hive if it is not already installed.

2. Install and configure HBase if it is not already installed.

3. Execute the jps command and ensure that all relevant Hadoop, HBase and Zookeeper processes are running.

Example:

$ jps
21985 HRegionServer
1549 jenkins.war
15051 QuorumPeerMain
30935 Jps
15551 CommandServer
15698 HMaster
15293 JobTracker
15328 TaskTracker
15131 WardenMain

Configure the hive-site.xml File

1. Open the hive-site.xml file with your favorite editor, or create a hive-site.xml file if it doesn't already exist:

$ cd $HIVE_HOME
$ vi conf/hive-site.xml

2. Copy the following XML code and paste it into the hive-site.xml file.

Note: If you already have an existing hive-site.xml file with a configuration element block, just copy the property element block code below and paste it inside the configuration element block in the hive-site.xml file.

Example configuration:

<configuration>

<property>
  <name>hive.aux.jars.path</name>
  <value>file:///opt/mapr/hive/hive-0.10.0/lib/hive-hbase-handler-0.10.0-mapr.jar,file:///opt/mapr/hbase/hbase-0.94.5/hbase-0.94.5-mapr.jar,file:///opt/mapr/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.jar</value>
  <description>A comma separated list (with no spaces) of the jar files required for Hive-HBase integration</description>
</property>

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>xx.xx.x.xxx,xx.xx.x.xxx,xx.xx.x.xxx</value>
  <description>A comma separated list (with no spaces) of the IP addresses of all ZooKeeper servers in the cluster.</description>
</property>

<property>
  <name>hbase.zookeeper.property.clientPort</name>
  <value>5181</value>
  <description>The Zookeeper client port. The MapR default clientPort is 5181.</description>
</property>

</configuration>


3. Save and close the hive-site.xml file.

If you have successfully completed all the steps in this Install and Configure Hive and HBase section, you're ready to begin the Getting Started with Hive-HBase Integration tutorial in the next section.

Getting Started with Hive-HBase Integration

In this tutorial you will:

  • Create a Hive table
  • Populate the Hive table with data from a text file
  • Query the Hive table
  • Create a Hive-HBase table
  • Introspect the Hive-HBase table from HBase
  • Populate the Hive-Hbase table with data from the Hive table
  • Query the Hive-HBase table from Hive
  • Convert an existing HBase table into a Hive-HBase table

Be sure that you have successfully completed all the steps in the Install and Configure Hive and HBase section before beginning this Getting Started tutorial.

This Getting Started tutorial closely parallels the Hive-HBase Integration section of the Apache Hive Wiki, and thanks to Samuel Guo and other contributors to that effort. If you are familiar with their approach to Hive-HBase integration, you should be immediately comfortable with this material.

However, please note that there are some significant differences in this Getting Started section, especially in regards to configuration and command parameters or the lack thereof. Follow the instructions in this Getting Started tutorial to the letter so you can have an enjoyable and successful experience.

Create a Hive table with two columns:

Change to your Hive installation directory if you're not already there and start Hive:

$ cd $HIVE_HOME
$ bin/hive


Execute the CREATE TABLE command to create the Hive pokes table:

hive> CREATE TABLE pokes (foo INT, bar STRING);


To see if the pokes table has been created successfully, execute the SHOW TABLES command:

hive> SHOW TABLES;
OK
pokes
Time taken: 0.74 seconds

The pokes table appears in the list of tables. 

Populate the Hive pokes table with data

Execute the LOAD DATA LOCAL INPATH command to populate the Hive pokes table with data from the kv1.txt file.

The kv1.txt file is provided in the $HIVE_HOME/examples directory.

hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;


A message appears confirming that the table was created successfully, and the Hive prompt reappears:

Copying data from file:
...
OK
Time taken: 0.278 seconds
hive>


Execute a SELECT query on the Hive pokes table:

hive> SELECT * FROM pokes WHERE foo = 98;


The SELECT statement executes, runs a MapReduce job, and prints the job output:

OK
98      val_98
98      val_98
Time taken: 18.059 seconds

The output of the SELECT command displays two identical rows because there are two identical rows in the Hive pokes table with a key of 98. 

Note: This is a good illustration of the concept that Hive tables can have multiple identical keys. As we will see shortly, HBase tables cannot have multiple identical keys, only unique keys. 

To create a Hive-HBase table, enter these four lines of code at the Hive prompt:

hive> CREATE TABLE hbase_table_1(key int, value string)
    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
    > TBLPROPERTIES ("hbase.table.name" = "xyz");

After a brief delay, a message appears confirming that the table was created successfully:

OK
Time taken: 5.195 seconds


Note: The TBLPROPERTIES command is not required, but those new to Hive-HBase integration may find it easier to understand what's going on if Hive and HBase use different names for the same table.

In this example, Hive will recognize this table as "hbase_table_1" and HBase will recognize this table as "xyz". 

Start the HBase shell:

Keeping the Hive terminal session open, start a new terminal session for HBase, then start the HBase shell:

$ cd $HBASE_HOME
$ bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.4, rUnknown, Wed Nov  9 17:35:00 PST 2011

hbase(main):001:0>


Execute the list command to see a list of HBase tables:

hbase(main):001:0> list
TABLE
xyz
1 row(s) in 0.8260 seconds

HBase recognizes the Hive-HBase table named xyz. This is the same table known to Hive as hbase_table_1.

Display the description of the xyz table in the HBase shell:

hbase(main):004:0> describe "xyz"
DESCRIPTION                                                                       ENABLED
 {NAME => 'xyz', FAMILIES => [{NAME => 'cf1', BLOOMFILTER => 'NONE', REPLICATI true
 ON_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BL
 OCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0190 seconds


From the Hive prompt, insert data from the Hive table pokes into the Hive-HBase table hbase_table_1:

hive> INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98;
...
2 Rows loaded to hbase_table_1
OK
Time taken: 13.384 seconds


Query hbase_table_1 to see the data we have inserted into the Hive-HBase table:

hive> SELECT * FROM hbase_table_1;
OK
98      val_98
Time taken: 0.56 seconds


Even though we loaded two rows from the Hive pokes table that had the same key of 98, only one row was actually inserted into hbase_table_1. This is because hbase_table_1 is an HBASE table, and although Hive tables support duplicate keys, HBase tables only support unique keys. HBase tables arbitrarily retain only one key, and will silently discard all the data associated with duplicate keys. 

Convert a pre-existing HBase table to a Hive-HBase table

To convert a pre-existing HBase table to a Hive-HBase table, enter the following four commands at the Hive prompt.

Note that in this example the existing HBase table is my_hbase_table.

hive> CREATE EXTERNAL TABLE hbase_table_2(key int, value string)
    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val")
    > TBLPROPERTIES("hbase.table.name" = "my_hbase_table");


Now we can run a Hive query against the pre-existing HBase table my_hbase_table that Hive sees as hbase_table_2:

hive> SELECT * FROM hbase_table_2 WHERE key > 400 AND key < 410;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
...
OK
401     val_401
402     val_402
403     val_403
404     val_404
406     val_406
407     val_407
409     val_409
Time taken: 9.452 seconds

Getting Started with Hive-MapR Tables Integration

MapR tables, introduced in version 3.0 of the MapR distribution for Hadoop, use the native MapR-FS storage layer. A full tutorial on integrating Hive with MapR tables is available at Integrating Hive and MapR Tables.

Zookeeper Connections

If you see the following error message, ensure that hbase.zookeeper.quorum and hbase.zookeeper.property.clientPort are properly defined in the $HIVE_HOME/conf/hive-site.xml file.

Failed with exception java.io.IOException:org.apache.hadoop.hbase.ZooKeeperConnectionException:
HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a
sign that the server has too many connections (30 is the default). Consider inspecting your
ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as
you can. See HTable's javadoc for more information.

Attachments:

sample-table.txt (text/plain)
ManageHiveServer2.png (image/png)
ManageHiveMeta.png (image/png)