You can create MapR-DB tables from Hive that can be accessed by both Hive and MapR-DB. You can run Hive queries on MapR-DB tables, convert existing MapR-DB tables into Hive-MapR-DB tables, and run Hive queries on those tables as well.
Install and Configure Hive
- Install and configure Hive if it is not already installed.
- Execute the
jpscommand and ensure that all relevant Hadoop, MapR, and Zookeeper processes are running.
hive-site.xmlfile with your favorite editor, or create a
hive-site.xmlfile if it doesn't already exist:
Copy the following XML code and paste it into the
hive-site.xmlfile.If you already have an existing
hive-site.xmlfile with a
configurationelement block, just copy the
propertyelement block code below and paste it inside the
configurationelement block in the
hive-site.xmlfile. Be sure to use the correct values for the paths to your auxiliary JARs and ZooKeeper IP numbers.
- Save and close the
If you have successfully completed all of the steps in this section, you're ready to begin the tutorial in the next section.
Getting Started with Hive and MapR-DB Integration
In this tutorial we will:
- Create a Hive table
- Populate the Hive table with data from a text file
- Query the Hive table
- Create a Hive-MapR-DB table
- Introspect the Hive-MapR-DB table from the HBase shell
- Populate the Hive-MapR-DB table with data from the Hive table
- Query the Hive-MapR-DB table from Hive
- Convert an existing MapR-DB table into a Hive-MapR table
This Getting Started tutorial is based on the Hive-HBase Integration section of the Apache Hive Wiki. However, please note that there are some significant differences.
Create a Hive table with two columns
Change to your Hive installation directory if you're not already there and start Hive:
Execute the CREATE TABLE command to create the Hive
To see if the
pokes table has been created successfully, execute the
SHOW TABLES command
pokes table appears in the list of tables.
Populate the Hive
pokes table with data:
kv1.txt file is provided in the
$HIVE_HOME/examples/files directory. Execute the LOAD DATA LOCAL INPATH command to populate the Hive
pokes table with data from the
A message appears confirming that the table was created successfully, and the Hive prompt reappears:
Execute a SELECT query on the Hive
The SELECT statement executes, runs a MapReduce job, and prints the job output:
The output of the SELECT command displays two identical rows because there are two identical rows in the Hive
pokes table with a key of 98.
Hive tables can have multiple identical keys. As we will see shortly, MapR-DB tables cannot have multiple identical keys, only unique keys.
Create a Hive-MapR-DB table
Enter these four lines of code at the Hive prompt:
After a brief delay, a message appears confirming that the table was created successfully:
Note: The TBLPROPERTIES command is not required, but those new to Hive-MapR-DB integration may find it easier to understand what's going on if Hive and MapR-DB use different names for the same table.
In this example, Hive will recognize this table as "mapr_table_1" and MapR-DB will recognize this table as "xyz".
Start the HBase shell
Keeping the Hive terminal session open, start a new terminal session for HBase, then start the HBase shell:
list command to see a list of HBase tables
HBase recognizes the Hive-MapR-DB table named
xyz in directory
/user/mapr. This is the same table known to Hive as
Display the description of the
/user/mapr/xyz table in the HBase shell
From the Hive prompt, insert data from the Hive table
pokes into the Hive-MapR-DB table
mapr_table_1 to see the data we have inserted into the Hive-MapR-DB table
Even though we loaded two rows from the Hive
pokes table that had the same key of 98, only one row was actually inserted into
mapr_table_1. This is because
mapr_table_1 is a MapR-DB table, and although Hive tables support duplicate keys, MapR-DB tables only support unique keys. MapR-DB tables arbitrarily retain only one key, and silently discard all of the data associated with duplicate keys.
Convert a pre-existing MapR-DB table to a Hive-MapR-DB table
To convert a pre-existing MapR-DB table to a Hive-MapR-DB table, enter the following four commands at the Hive prompt.
Note that in this example the existing MapR-DB table is
my_mapr_table in directory
Now we can run a Hive query against the pre-existing MapR-DB table
/user/mapr/my_mapr_table that Hive sees as