10 min read
The MapR Distribution including Apache™ Hadoop® employs drivers from Simba Technologies to connect to client ODBC and JDBC applications allowing you to access data on MapR from tools like Tableau with ODBC or SQuirreL with JDBC. This post will walk you through the steps to set up and connect your Apache Hive instance to both an ODBC and JDBC application running on your laptop or other client machine. Although you may already have your own Hive cluster set up, this post focuses on the MapR Sandbox for Hadoop virtual machine (VM). There are several steps to setting up, and then connecting Hive to your ODBC/JDBC app:
First, download the MapR Sandbox for Hadoop VM from this page: mapr.com/try-mapr/sandbox/
You will need one of VMware or VirtualBox to run the MapR VM. This post will use MapR 3.1, which is the latest at the time of writing.
After downloading the VM, start it up and you’ll be presented with a screen that looks like the following:
The first thing to do is to ensure that Hive is running correctly. Open your browser and enter the URL displayed in the MapR VM, which should open a page similar to this:
Select “Launch HUE” and enter the credentials of “mapr"/”mapr" at the login prompt. The new page should say “All OK. Configuration check passed.”
The next step is to create a table in Hive to query in our ODBC/JDBC application. There are a number of ways to do this, for this tutorial we’re going to import a CSV file and create a table in Hive referencing the CSV.
First, create a CSV named tab.csv in the VM with the following content:
1,12.34,2010-01-02 03:04:05 2,6.78,2012-09-10 11:12:00 3,134.567,2008-08-18 19:20:21
The easiest way is to create the file locally, then use a tool like WinSCP (for Windows) to upload the file to the VM. Use the credentials of “root”/”mapr” when logging into the VM for uploading the file. Mark the location that the file is uploaded to.
Now start the Hive shell to create and import the table with the following command:
Create a table in Hive using the following command:
CREATE TABLE tab ( id INT, col_1 DOUBLE, col_2 TIMESTAMP ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile;
Note that since we are using the Hive shell, this should be entered as one line.
Now import the data using the following command:
Load data local inpath ‘<local_path>/tab.csv’ into table tab;</local_path>
where <local_path>is the path on the VM to the tab.csv file that was uploaded.</local_path>
Verify that the table was successfully created using the following command:
select * from tab;
You should see the contents of the CSV returned as a table.
Follow the instructions to obtain the ODBC drivers for MapR from this page: doc.mapr.com/display/MapR/Hive+ODBC+Connector. Once downloaded, run the installer and choose where you would like to install the driver.
Once complete, open the start menu, navigate to the “MapR Hive ODBC Connector” item and choose the “32-bit ODBC Driver Manager” (or “64-bit ODBC Driver Manager”) option.
Select the “System DSN” tab.
Select the “Sample MapR Hive DSN – 32-bit” (or Sample MapR Hive DSN – 64-bit”), and press the “Configure…” button.
Enter the Host of your MapR Sandbox (this is the IP address displayed on the start-up page for the MapR Sandbox VM), and enter “mapr”/”mapr” for the User Name and Password and 10000 as the port id. Press the “Test” button to test your configuration.
Press OK to close the test result, and OK again to save the DSN configuration.
We’ll use 32 bit versions of Excel as well as Tableau to demonstrate connectivity via ODBC.
We’ll be using Excel 2013, however the driver will also work with older versions of Excel in the same way. Note that there are 32 and 64-bit versions of Excel, you should ensure the of the driver matches the version of Excel, otherwise the driver will not work correctly.
Start Excel and select the Data tab in the Ribbon.
Choose “Get External Data” -> “From Other Sources” -> “From Data Connection Wizard”. Select “ODBC DSN” in the dialog and press “Next >”.
Choose the “Sample MapR Hive DSN – 32-bit” (or “Sample MapR Hive DSN – 64-bit”) that was configured earlier and press “Next >”.
Enter “mapr” into the Password field of the connection dialog and press “OK”.
Choose the “tab” table that was set up previously and press Finish.
Press OK in the following dialog to import into Excel.
If prompted, enter “mapr” as the password for the connection dialog. Congratulations, data is now presented in Excel!
We’ll be using Tableau 8.2, however the driver will also work with older versions of Tableau. Note that there are 32 and 64-bit versions of Tableau, for the best experience you should match the version of the driver to that of Tableau, however 64-bit Tableau can also load 32-bit drivers.
Open Tableau and click on the “Connect to data” link.
Choose the “MapR Hadoop Hive” option in the left pane.
Enter the host of the MapR Sandbox VM (the IP address displayed on the start up page for the MapR Sandbox VM) in the Server field. Change the Type to HiveServer2, and the Authentication to “User Name and Password”. Enter “mapr”/”mapr” for the Username and Password fields, then press Connect.
Choose “default” in the Schema drop-down. You may need to press the small magnifying glass in the drop-down to find the schemas in your MapR Hive installation first.
Press the little magnifying glass in the Table field, and drag the “tab” table that we created earlier into the top right-hand pane.
Hit the “Update Now” button to preview the data.
Hit the “Go to Worksheet” button to start working with your data.
Congratulations, you can now work with your MapR Hive data in Tableau!
Follow the instructions to obtain the JDBC drivers for MapR from this page: If you are using the MapR Sandbox, or another MapR system, you will be able to copy the files directly from your installation. doc.mapr.com/display/MapR/JDBC+Connections.
You will also need a JDBC application, we’ll use SQuirreL which is available here: http://squirrel-sql.sourceforge.net/. After installing it, open the program. The first step is to install the MapR Hive JDBC driver. Switch to the “Drivers” on the left pane and hit the little green plus button:
Now switch to the “Extra Class Path” tab and hit the “Add” button to select all of the JAR files in the location that the JDBC driver files were extracted, and hit “Open”.
Hit the “List Drivers” button to automatically detect the driver class name, choosing the entry with HS2Driver in the name. To manually enter it, use “com.mapr.hive.jdbc3.HS1Driver” (or HS2Driver for HiveServer2) for the JDBC 3 driver and “com.mapr.hive.jdbc4.HS1Driver” (or HS2Driver for HiveServer2) for the JDBC 4 driver.
Enter “MapR Hive” into the Name field, “jdbc:hive2://host:port” into the Example URL field, and hit OK.
Now that the driver has been added, a connection needs to be made to your MapR VM that is running Hive. Switch to the “Aliases” tab on the left pane and hit the small green plus button to open the “Add Alias” dialog.
Enter “Hive” into the Name field and select the “MapR Hive” driver that we just created. In the URL field, replace host and port with the actual host and port for your MapR Hive instance. If using the MapR Sandbox this IP address is displayed on the start-up page for the VM. The default port for Hive is 10000. After filling in the information, press the OK button.
Press the Connect button in the next prompt, enter “mapr”/”mapr” for the User Name and Password fields. You should now see a live connection to Hive via JDBC:
Switch to the SQL tab and type:
select * from sample.tab
then press the button with the man running on it to execute the query and return the results of the table that was created previously.
You can type any SQL query in the dialog to be executed against Hive. If you wish to use HiveQL query language instead of SQL, append “;UseNativeQuery=0” to the end of the connection URL.
Congratulations! You’re now connected via JDBC. More importantly, you can now analyze your data using the power of SQL or HiveQL in your client app of choice.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.