Enabling High Availability for Spark Thrift Server

To enable high availability for Spark Thrift Server, use the following steps:
  1. Install Spark Thrift Server on all the cluster nodes where it is needed:
    On Ubuntu
    apt-get install mapr-spark-thriftserver
    On Red Hat / CentOS
    yum install mapr-spark-thriftserver
    On SUSE
    zypper install mapr-spark-thriftserver
  2. Add the following properties to the /opt/mapr/spark/spark-<spark_version>/conf/hive-site.xml file on all the nodes where the Spark Thrift Server is installed
    <property>
    <name>hive.zookeeper.quorum</name>
    <value><zk_host1_>,<zk_host_2>,…,<zk_host_n></value>
    </property>
    
    <property>
    <name>hive.zookeeper.client.port</name>
    <value><zk_port></value>
    </property>
    
    <property>
    <name>hive.server2.support.dynamic.service.discovery</name>
    <value>true</value>
    </property>
    
    <property>
    <name>hive.server2.zookeeper.namespace</name>
    <value><zk_namespace></value>
    </property>
    For example:
    <property>
    <name>hive.zookeeper.quorum</name>
    <value>node1.cluster.com,node2.cluster.com,node3.cluster.com</value>
    </property>
    
    <property>
    <name>hive.zookeeper.client.port</name>
    <value>5181</value>
    </property>
    
    <property>
    <name>hive.server2.support.dynamic.service.discovery</name>
    <value>true</value>
    </property>
    
    <property>
    <name>hive.server2.zookeeper.namespace</name>
    <value>ts2-ts2</value>
    </property>
    Note: The values that you provide for the hive.server2.zookeeper.namespace property should be different for the hive-site.xml in the Spark and Hive directories.
  3. Launch the Zookeeper command line interface, and check the Spark Thriftserver znode by running the following commands:
    /opt/mapr/zookeeper/zookeeper-<version>/bin/zkCli.sh -server <ip:port of zookeeper instance>
    ls /<hive.server2.zookeeper.namespace>
    For example:
    /opt/mapr/zookeeper/zookeeper-3.4.11/bin/zkCli.sh -server node1.cluster.com:5181
    ls /ts2-ts2
    [serverUri=node1.cluster.com:2304;version=;sequence=0000000000]
  4. Using Beeline, you can connect to the Spark Thrift Server by using the following string:
    beeline> !connect jdbc:hive2://<hostname -f>:5181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=<hive.server2.zookeeper.namespace>;
    For example:
    ./bin/beeline
    Warning: Unable to determine $DRILL_HOME
    Beeline version 1.2.0-mapr-spark-MEP-6.0.0-1912 by Apache Hive
    beeline> !connect jdbc:hive2://node1.cluster.com:5181/default;ssl=true;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=ts2-ts2;auth=maprsasl;
    Connecting to jdbc:hive2://node1.cluster.com:5181/default;ssl=true;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=ts2-ts2;auth=maprsasl;
    20/03/29 21:38:19 WARN MaprSaslClient: SASL Server qopProperty: auth-confis different from Client: auth-conf,auth-int,auth.Using Server one
    Connected to: Spark SQL (version 2.4.4.0-mapr-630)
    Driver: Hive JDBC (version 1.2.0-mapr-spark-MEP-6.0.0-1912)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    1: jdbc:hive2://node1.cluster.com:5181/defaul> show databases;
    +-----------------+
    | databaseName |
    +-----------------+
    | default             |
    +-----------------+
    1 row selected (0.11 seconds)
Note: High availability for the Spark Thrift Server can be used in conjunction with HiveServer2 high availability. For more information about HiveServer2 high availability, see Enabling High Availability for Hive.