Requirements for Running Spark SQL Jobs

To run Spark SQL jobs, you must provide a directory for storing Spark SQL table metadata. You can specify the directory to use by setting the value for the spark.sql.warehouse.dir property in the Spark application CR. For example:
spark.sql.warehouse.dir: "maprfs:///spark-warehouse"
Also, the metastore service can be used for Compute Spaces (CSpaces). The Hive Metastore configmap named mapr-hivesite-cm is created automatically and deployed in the corresponding CSpace namespace. For an external cluster that has its own Hive Metastore service, an extra configmap named mapr-hivesite-external-cm can be created using the gen-external-secrets.sh script (located in the /tools directory). You can mount a configmap that contains metastore configuration to driver pods by doing the following:
  1. Specify the configmap for the Spark application in the volumes section of the application CR. For example, your entry in the CR should look similar to the following:
    volumes:
    - name: hive-site-volume 
      configMap:
        name: mapr-hivesite-external-cm
    The following configmap is available by default:
    • mapr-hivesite-external-cm — points to metastore located on external cluster (if it exists)
    • mapr-hivesite-cm — points to metastore located in CSpace
  2. Mount the volume to the driver using the driver:volumeMounts property in the application CR. For example, your entry in the CR should look similar to the following:
    driver:
      cores: 1
      coreLimit: "1000m"
      memory: "512m"
      labels:
        version: 2.4.4
      volumeMounts:
      - name: hive-site-volume
        mountPath: /opt/mapr/spark/spark-2.4.4/conf/hive-site.xml
        subPath: hive-site.xml
      serviceAccount: mapr-externalcspace-cspace-sa

A complete example of a Spark application with the above configuration is available in the mapr-spark-hive.yaml file in the examples/spark/ directory. To run the sample Spark Hive Metastore job using the mapr-spark-hive.yaml file, you must create a directory for the Spark application. For more information, see Requirements for Running the Sample Spark Applications.