You can create HBase tables from Hive that can be accessed by both Hive and HBase. This allows you to run Hive queries on HBase tables. You can also convert existing HBase tables into Hive-HBase tables and run Hive queries on those tables as well.
In this section:
Install and Configure Hive and HBase
1. Install and configure Hive if it is not already installed.
2. Install and configure HBase if it is not already installed.
3. Execute the
jps command and ensure that all relevant Hadoop, HBase and Zookeeper processes are running.
1. Open the
hive-site.xml file with your favorite editor, or create a
hive-site.xml file if it doesn't already exist:
2. Copy the following XML code and paste it into the
configuration element block of the
3. Save and close the
Getting Started with Hive-HBase Integration
In this tutorial you will:
- Create a Hive table
- Populate the Hive table with data from a text file
- Query the Hive table
- Create a Hive-HBase table
- Introspect the Hive-HBase table from HBase
- Populate the Hive-Hbase table with data from the Hive table
- Query the Hive-HBase table from Hive
- Convert an existing HBase table into a Hive-HBase table
Be sure that you have successfully completed all the steps in the Install and Configure Hive and HBase section before beginning this Getting Started tutorial. This Getting Started tutorial closely parallels the Hive-HBase Integration section of the Apache Hive Wiki, and thanks to Samuel Guo and other contributors to that effort.
Create a Hive table with two columns:
Change to your Hive installation directory if you're not already there and start Hive:
Execute the CREATE TABLE command to create the Hive
To see if the
pokes table has been created successfully, execute the
SHOW TABLES command:
pokes table appears in the list of tables.
Populate the Hive
pokes table with data
Execute the LOAD DATA LOCAL INPATH command to populate the Hive
pokes table with data from the
kv1.txt file is provided in the
A message appears confirming that the table was created successfully, and the Hive prompt reappears:
Execute a SELECT query on the Hive
The SELECT statement executes, runs a MapReduce job, and prints the job output:
The output of the SELECT command displays two identical rows because there are two identical rows in the Hive
pokes table with a key of 98.
Note: This is a good illustration of the concept that Hive tables can have multiple identical keys. As we will see shortly, HBase tables cannot have multiple identical keys, only unique keys.
To create a Hive-HBase table, enter these four lines of code at the Hive prompt:
After a brief delay, a message appears confirming that the table was created successfully:
Note: The TBLPROPERTIES command is not required, but those new to Hive-HBase integration may find it easier to understand what's going on if Hive and HBase use different names for the same table.
In this example, Hive will recognize this table as "hbase_table_1" and HBase will recognize this table as "xyz".
Start the HBase shell:
Keeping the Hive terminal session open, start a new terminal session for HBase, then start the HBase shell:
list command to see a list of HBase tables:
HBase recognizes the Hive-HBase table named
xyz. This is the same table known to Hive as
Display the description of the
xyz table in the HBase shell:
From the Hive prompt, insert data from the Hive table
pokes into the Hive-HBase table
hbase_table_1 to see the data we have inserted into the Hive-HBase table:
Even though we loaded two rows from the Hive
pokes table that had the same key of 98, only one row was actually inserted into
hbase_table_1. This is because
hbase_table_1 is an HBASE table, and although Hive tables support duplicate keys, HBase tables only support unique keys. HBase tables arbitrarily retain only one key, and will silently discard all the data associated with duplicate keys.
Convert a pre-existing HBase table to a Hive-HBase table
To convert a pre-existing HBase table to a Hive-HBase table, enter the following four commands at the Hive prompt.
Note that in this example the existing HBase table is
Now we can run a Hive query against the pre-existing HBase table
my_hbase_table that Hive sees as