Spark SQL Thrift Server

Spark SQL Thrift (Spark Thrift) was developed from Apache Hive HiveServer2 and operates like HiveSever2 Thrift server. It is supported on secure clusters. You can run the Spark Thrift server and connect to Hive versions supported by Spark 2.1.0 with Business Intelligence (BI) tools or the Beeline command-line tool.

Starting in the MEP 4.0 release, the Spark Thrift server is available as a separate package. For instructions about installing this package, see Installing Spark Standalone or Installing Spark on YARN, depending on the type of cluster manager you are installing.

In MEP 3.0, MapR introduces additional security mechanisms for Spark with the Spark Thrift server. MapR-SASL and Kerberos are supported:

  • For JDBC connections into Spark Thrift server
  • Between Spark and Hive metastore

To enable these security mechanisms for the Spark Thrift server, starting in the MEP 4.0 release, for secure clusters, running configure.sh -R configures MapR-SASL security. The script modifies or creates a SPARK_HOME/conf/hive-site.xml file as follows:

  • If Hive installed in your cluster, the script copies HIVE_HOME/conf/hive-site.xml to SPARK_HOME/conf and modifies the file.
  • If Hive is not installed and you are using MapR-SASL security, the script creates a new SPARK_HOME/conf/hive-site.xml file.
  • Each time the script runs, if there is a pre-existing SPARK_HOME/conf/hive-site.xml file, the script saves a copy of the file in SPARK_HOME/conf/hive-site.xml.old before modifying it.

You can manually configure security by following the steps outlined in sub-topics listed on this page.

To launch Spark Thrift server, perform the procedures required to configure Spark to use Hive.

Important: Starting in the MEP 4.0 release, if you start and stop the Spark Thrift server using Warden, the connection port number is 2304. If you start and stop by running the /opt/mapr/spark/<spark-version/sbin/{start,stop}-thriftserver.sh scripts, the port number remains 10000.

Default Behavior

The default behavior of the Spark Thrift server is as follows:

  1. After installation, the Spark Thrift server is started in the local master mode.
  2. If the Spark master package is installed, then Spark Thrift server is started in the standalone master mode.
  3. If the spark.master property is set in the spark-defaults.conf file, then Spark Thrift server uses the master set by this property.

Known Limitations

  • MapR-SASL support is only implemented for Spark 2.1.0.
  • Username and password authentication through PAM is not supported in MEP 3.0.
  • Only SELECT statements support impersonation usage to access data stored in MapR-FS and/or MapR-DB.
  • Spark Thrift server supports only features and commands in Hive 1.2.
  • Although Spark 2.1.0 can connect to Hive 2.1 Metastore, only Hive 1.2 features and commands are supported by Spark 2.1.0.