MapR 4.0.x Documentation : Using HiveServer2

HiveServer2 allows multiple concurrent connections to the Hive server over a network.

HiveServer2 is included as a patch on the Hive 0.9.0 base release in the MapR distribution for Apache Hadoop. See Installing Hive for installation details. The mapr-hive package includes both HiveServer1 and HiveServer2, and you can choose which one to run.

This page contains the following topics:

Configuring Hive for HiveServer2

HiveServer2 accesses Hive data without alteration if you are not changing releases of Hive. You do not need to update or otherwise transform data in order to begin using HiveServer2. Simply enable support, as described below, and run hiveserver2 instead of the previous HiveServer.

To configure Hive for use with HiveServer2, include the following configuration properties in the /opt/mapr/hive/hive<version>/conf/hive-site.xml configuration file.

<property>
  <name>hive.support.concurrency</name>
  <description>Enable Hive's Table Lock Manager Service</description>
  <value>true</value>
</property>

<property>
  <name>hive.zookeeper.quorum</name>
  <description>Zookeeper quorum used by Hive's Table Lock Manager</description>
  <value><zk node1>,<zk node2>,...,<zk nodeN></value>
</property>

<property>
  <name>hive.zookeeper.client.port</name>
  <value>5181</value>
  <description>The Zookeeper client port. The MapR default clientPort is 5181.</description>
</property>

For the hive.zookeeper.quorum property above, substitute the <zk nodeX> values with a comma-separated list of the node hostnames or IP addresses running the ZooKeeper service.

For users migrating from HiveServer1, you might need to modify applications and scripts to interact with HiveServer2.

If you run a dedicated instance of HiveServer1 on each client because HiveServer1 does not support concurrent connections, you can replace those instances with a single instance of HiveServer2.

HiveServer2 uses a different connection URL for the JDBC driver. Existing scripts that use JDBC to communicate with HiveServer1 will need to be modified by changing the JDBC driver URL from jdbc:hive//<hostname>:<port> to jdbc:hive2://<hostname>:<port>.

Enabling SSL for HiveServer2

To enable SSL for HiveServer2, set the following parameters in the hive-site.xml file:

<property>
  <name>hive.server2.enable.ssl</name>
  <value>true</value>
  <description>enable/disable SSL communication</description>
</property>

<property>
  <name>hive.server2.ssl.keystore</name>
  <value><path-to-keystore-file></value>
  <description>path to keystore file</description>
</property>

You can specify the keystore password in the hive-site.xml file by adding the following parameter:

<property>
  <name>hive.server2.ssl.keystore.password</name>
  <value><password></value>
  <description>keystore password</description>
</property>

If you specify the password in the hive-site.xml file, protect the file with the appropriate file permissions.

HiveServer2 automatically prompts for the keystore password during startup when no password is stored in the hive-site.xml file.

Configuring Security Authentication for HiveServer2

LDAP Authentication using OpenLDAP

Include the following properties in the hive-site.xml file to enable LDAP authentication using OpenLDAP.

<property>
  <name>hive.server2.authentication</name>
  <value>LDAP</value>
</property>
<property>
  <name>hive.server2.authentication.ldap.url</name>
  <value><LDAP URL></value>
</property>
<property>
  <name>hive.server2.authentication.ldap.baseDN</name>
  <value><LDAP Base DN></value>
</property>

Substitute <LDAP URL> with the access URL for your LDAP server. Substitute <LDAP BaseDN> with the base LDAP DN for your LDAP server, for example, ou=People,dc=mycompany,dc=com.

Setting up Authentication with Pluggable Access Modules

The following steps enable Pluggable Access Module (PAM) authentication for HiveServer2:

  1. Set the following config properties in the hive-site.xml file:

    1. For Hive 0.11 and Hive 0.12:

      <property>
          <name>hive.server2.authentication</name>
          <value>CUSTOM</value>
      </property>
      
      <property>
          <name>hive.server2.custom.authentication.class</name>
          <value>org.apache.hive.service.auth.PamAuthenticationProvider</value>
      </property>
      
      <property>
          <name>hive.server2.authentication.pam.profiles</name>
          <value>login, sudo</value>
          <description>comma separated list of pam modules to verify</description>
      </property>

    2. For Hive 0.13:

      <property>
          <name>hive.server2.authentication</name>
          <value>PAM</value>
      </property>

      <property>
          <name>hive.server2.authentication.pam.services</name>
          <value>login,sudo</value>
          <description>comma separated list of pam modules to verify</description>
      </property>

  2. Restart HiveServer2 to apply these changes.
  3. Enter the username and password in your Hive client. In the following example, Beeline is the client.

    ~$ beeline
    Beeline version 0.11-mapr by Apache Hive
    beeline> !connect jdbc:hive2://<HiveServer2Host>:<port>/default
    scan complete in 4ms
    Connecting to jdbc:hive2://<HiveServer2Host>:<port>/default
    Enter username for jdbc:hive2://<HiveServer2Host>:<port>/default: mapr
    Enter password for jdbc:hive2://<HiveServer2Host>:<port>/default: *******
    Hive history file=/tmp/mapr/hive_job_log_97d1cf06-bbf5-4abf-9bbb-d9ce56667fdf_941674138.txt
    Connected to: Hive (version 0.11-mapr)
    Driver: Hive (version 0.11-mapr)
    Transaction isolation: TRANSACTION_REPEATABLE_READ

Configuring Custom Authentication

To implement custom authentication for HiveServer2, create a custom Authenticator class derived from the following interface:

public interface PasswdAuthenticationProvider {
  /**
   * The Authenticate method is called by the HiveServer2 authentication layer
   * to authenticate users for their requests.
   * If a user is to be granted, return nothing/throw nothing.
   * When a user is to be disallowed, throw an appropriate {@link AuthenticationException}.
   *
   * For an example implementation, see {@link LdapAuthenticationProviderImpl}.
   *
   * @param user - The username received over the connection request
   * @param password - The password received over the connection request
   * @throws AuthenticationException - When a user is found to be
   * invalid by the implementation
   */
  void Authenticate(String user, String password) throws AuthenticationException;
}

The attached SampleAuthenticator.java code has an example implementation that has stored usernames and passwords.

Add the following properties to the hive-site.xml file, then restart Hiveserver2:

  <property>
    <name>hive.server2.authentication</name>
    <value>CUSTOM</value>
  </property>

  <property>
    <name>hive.server2.custom.authentication.class</name>
    <value>hive.test.SampleAuthenticator</value>
  </property>

Configuring User Impersonation

You must run Hive 0.11 or later with Version 3.0.2 or later of the MapR Distribution for Hadoop for user impersonation to work.

User impersonation enables Hive to submit jobs as a particular user. Without impersonation, Hive submits jobs as the user that started the HiveServer2 process. On a MapR cluster, this user is typically the mapr user or the user specified in the MAPR_USER environment variable. To enable impersonation for other users, log in as root, then create a file at $MAPR_HOME/conf/proxy/<username> for each user to impersonate.

To enable HiveServer2 to submit jobs to the MapR cluster as the user juser, log into the HiveServer2 node as root and run the following commands:

# mkdir $MAPR_HOME/conf/proxy
# chmod 755 $MAPR_HOME/conf/proxy
# touch $MAPR_HOME/conf/proxy/juser

To enable user impersonation for Hive, you have to set different properties for HiveServer2 and the Hive Metastore service:

  • hive.server2.enable.doAs: this property is for HiveServer2
  • hive.metastore.execute.setugi: this property is for the Hive Metastore service. If the Metastore service is running on a separate node, set this property on the Metastore node. Set this property for both the Metastore server and the Metastore client.

You also need to set proxyuser properties in the Hadoop core-site.xml file. See the following procedure.

To enable user impersonation:

  1. Set the following properties in the /opt/mapr/hive/<version>/conf/hive-site.xml file:
    <property>
      <name>hive.server2.enable.doAs</name>
      <value>true</value>
      <description>Set this property to enable impersonation in Hive Server 2</description>
    </property>
    <property>
      <name>hive.metastore.execute.setugi</name>
      <value>true</value>
      <description>Set this property to enable Hive Metastore service impersonation in unsecure mode. In unsecure mode, setting this property to true will cause the metastore to execute DFS operations using the client's reported user and group permissions. Note that this property must be set on both the client and server sides. If the client sets it to true and the server sets it to false, the client setting will be ignored.</description>
    </property>
  2. Set the following properties in the /opt/mapr/hadoop/hadoop-<version>/conf/core-site.xml file:
    <property>
      <name>hadoop.proxyuser.mapr.groups</name>
      <value>*</value>
      <description>Allow the superuser mapr to impersonate any member of any group</description>
    </property>
    <property>
      <name>hadoop.proxyuser.mapr.hosts</name>
      <value>*</value>
      <description>The superuser can connect from any host to impersonate a user</description>
    </property>

The impersonated user does not have the privileges to create the /user/hive/warehouse or /user/mapr-user/tmp/hive directories. Make sure these directories exist and have write permissions set for all users.

Example: Hive Impersonation

This example creates a table and loads data from the Beeline client, then lists the table to verify that it is owned by the impersonated user.

beeline> !connect jdbc:hive2://hostname:10000/default
scan complete in 2ms
Connecting to jdbc:hive2://hostname:10000/default
Enter username for jdbc:hive2://hostname:10000/default: userfoo
Enter password for jdbc:hive2://hostname:10000/default: ****
Connected to: Hive (version 0.11-mapr)
Driver: Hive (version 0.11-mapr)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hostname:10000/default> create table voter_table1(voternum INT,name string,age tinyint,registration string,contributions float,voterzone smallint);
No rows affected (0.22 seconds)
0: jdbc:hive2://hostname:10000/default> load data local inpath '/root/userfoo/hive/voter' into table voter_table1;
No rows affected (0.463 seconds)

To verify the ownership of the example table, run the following command:

[user@host ~]# hadoop fs -ls /user/hive/warehouse/voter_table1
Found 1 items
-rwxr-xr-x   3 userfoo users       8576 2013-10-18 14:48 /user/hive/warehouse/voter_table1/voter

Starting HiveServer2

If you are running the metastore in Remote mode, you need to start the metastore before HiveServer2.

hive --service metastore

To start the hiveserver2 service, execute the following command.

hive --service hiveserver2

Accessing Hive with the BeeLine client

HiveServer2 uses the BeeLine command line interface, and does not work with the Hive Shell used for HiveServer1. BeeLine is based on the SQLLine project, which is currently the best source of documentation. Refer to the SQLLine documentation page.

The following is an example of running basic SQL commands in BeeLine, using the same server that is running HiveServer2.

hive --service beeline
!connect jdbc:hive2://<hiveserver2 node>:10000 <hive username> <hive user password> org.apache.hive.jdbc.HiveDriver
show tables
select * from <table name>

Substitute <hiveserver2 node><hive username><hive user password>, and <table name> with valid values.

The example session below demonstrates a sample BeeLine session that lists tables and then lists all values in a table. The user is logged in as root, but this is not a requirement.

$ hive --service beeline
Beeline version 0.11-mapr by Apache Hive
beeline> !connect jdbc:hive2://10.10.100.56:10000 root mypasswd \
  org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://10.10.100.56:10000
Connected to: Hive (version 0.11-mapr)
Driver: Hive (version null)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://10.10.100.56:10000> load data local inpath '/user/data/hive/pokes' overwrite into table pokes;
0: jdbc:hive2://10.10.100.56:10000> show tables;
+-----------------+
|    tab_name     |
+-----------------+
| pokes           |
+-----------------+
1 rows selected (0.152 seconds)
0: jdbc:hive2://10.10.100.56:10000> select * from pokes;
+------+----------+
| foo  |   bar    |
+------+----------+
| 238  | val_238  |
| 86   | val_86   |
| 311  | val_311  |
| 27   | val_27   |
| 165  | val_165  |
| 409  | val_409  |
+------+----------+
6 rows selected (0.201 seconds)
0: jdbc:hive2://10.10.100.56:10000> exit
$ 

Connecting to HiveServer2

JDBC

The JDBC connection URI format and driver class for HiveServer2 are different from HiveServer1.

  • HiveServer2 uses URI format jdbc:hive2://<host>:<port> and class org.apache.hive.jdbc.HiveDriver.
  • HiveServer1 uses URI format jdbc:hive://<host>:<port> and class org.apache.hadoop.hive.jdbc.HiveDriver

ODBC

See Hive ODBC Connector for information on connecting to HiveServer2 via ODBC.

Configuring JDBC Clients for LDAP Authentication with HiveServer2

JDBC clients connect using a connection URL as shown below.

String url = "jdbc:hive2://hs2node:10000/default;user=<LDAP userid>;password=<password>"
Connection connection = DriverManager.getConnection(url);

Substitute <ldap userid> with the user id and <password> with the password for the client user.

Enabling SSL in Clients

Enabling SSL for JDBC

To enable SSL for JDBC, add the string ssl=true; to the URI string. JDBC requires a truststore and an optional truststore password. Applications that use JDBC, such as Beeline, can pass the truststore and the optional truststore password in the following ways:

Passing a truststore and truststore password in the URI string.

The following example uses this JDBC command to pass the truststore parameters in the URI string:

jdbc:hive2://<host>:<port>/<database>;ssl=true;sslTrustStore=<path-to-truststore>;sslTrustStorePassword=<password>
$ beeline
Beeline version 0.11-mapr by Apache Hive
beeline> !connect jdbc:hive2://127.0.0.1:10000/default;ssl=true;sslTrustStore=truststore.jks;sslTrustStorePassword=tsp
scan complete in 4ms
Connecting to jdbc:hive2://127.0.0.1:10000/default;ssl=true;sslTrustStore=truststore.jks;sslTrustStorePassword=tsp
Enter username for jdbc:hive2://127.0.0.1:10000/default;ssl=true;sslTrustStore=truststore.jks;sslTrustStorePassword=tsp: qa-user1
Enter password for jdbc:hive2://127.0.0.1:10000/default;ssl=true;sslTrustStore=truststore.jks;sslTrustStorePassword=tsp: ****    
Connected to: Hive (version 0.11-mapr)
Driver: Hive (version 0.11-mapr)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://127.0.0.1:10000/default> show tables;
+-------------------+
|     tab_name      |
+-------------------+
| table1            |
| table2            |
+-------------------+

Passing the truststore parameters as JVM arguments.

You can use the HADOOP_OPTS environment variable to pass JVM arguments to the Beeline client:

export HADOOP_OPTS="-Djavax.net.ssl.trustStore=<path-to-trust-store-file> -Djavax.net.ssl.trustStorePassword=<password>"
$ beeline
Beeline version 0.11-mapr by Apache Hive
beeline> !connect jdbc:hive2://127.0.0.1:1000/default;ssl=true
scan complete in 4ms
Connecting to jdbc:hive2://127.0.0.1:10000/default;ssl=true
Enter username for jdbc:hive2://127.0.0.1:10000/default;ssl=true: qa-user1
Enter password for jdbc:hive2://127.0.0.1:10000/default;ssl=true: ****
Connected to: Hive (version 0.11-mapr)
Driver: Hive (version 0.11-mapr)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://127.0.0.1:10000/default> show tables;
+-------------------+
|     tab_name      |
+-------------------+
| table1            |
| table2            |
+-------------------+

Using a CA signed certificate in JRE library

In this example, the SSL certificate is signed by a Certified Authority.

$ beeline
Beeline version 0.11-mapr by Apache Hive
beeline> !connect jdbc:hive2://127.0.0.1:1000/default;ssl=true
scan complete in 4ms
Connecting to jdbc:hive2://127.0.0.1:10000/default;ssl=true
Enter username for jdbc:hive2://127.0.0.1:10000/default;ssl=true: qa-user1
Enter password for jdbc:hive2://127.0.0.1:10000/default;ssl=true: ****
Connected to: Hive (version 0.11-mapr)
Driver: Hive (version 0.11-mapr)
Transaction isolation: TRANSACTION_REPEATABLE_READ

Using a self-signed SSL certificate

When you are using a self-signed certificate, import the certificate into an existing or new truststore file with the following command:

keytool -import -alias <alias> -file <path-to-cerficate-file> -keystore <truststorefile>

To install the self-signed certificate in the JRE, give the -keystore option in the above argument the value <JRE-HOME>/lib/security/cacerts.

Related Topics

Attachments: