Requirements for Running Spark Jobs with Dependencies

If your Spark application includes a main file and a dependency file, you must currently mount these types of MapR Filesystem directories to the Spark pods using the CSI driver rather than using the maprfs:// schema.

For example, suppose your Spark application includes the following files:
  • main.py – the main application file
  • dependency.py – a file that contains additional functions used inside the main file
The path to the main.py can be specified through the mainApplicationFile property, while the path to the dependency.py file can be specified through the deps:pyFiles property in the Spark application CR.

You must create a PersistentVolume that mounts a volume on the MapR Filesystem. You must also create a PersistentVolumeClaim in the Compute Space that binds to the Persistent Volume you created. You may create the PV and PVC manually (see CSI guide) or run the ticketcreator.sh utility in the CSpace terminal, which is able to create the PV and PVC for you. You should copy your files to the maprfs path used by the Persistent Volume. You should reference your PV and PVC as a volume and a volume mount in the CR. This reference allows you to point to your files using the local:// file path instead of the maprfs:// file path.

For example, suppose your Spark application has main.py and dependency.py files on a local file system. To mount the files using the local:// file path, use these steps:
  1. Log in to the CSpace Terminal pod by using ssh.
  2. Run the ticketcreator.sh utility by running the following command:
    /opt/mapr/kubernetes/ticketcreator.sh
  3. Enter the username and password of the user for whom you want to create a ticket, and the name for the user secret.
  4. Enter y when prompted whether to create a CSI PersistentVolumeClaim (PVC) and PersistentVolume (PV) for Spark secondary MapR Filesystem dependencies, and provide a name for the PVC and PV. The default names are mapr-csi-pvc for the PVC and mapr-csi-pv for the PV. For information about the ticketcreator.sh utility prompts, see Creating MapR Credentials for Spark Applications in Compute Spaces.
  5. Move or copy the files on the local file system to a directory, such as maprfs:///apps/py-test, in the MapR Filesystem.
  6. Open the Spark application CR, add the following information, and save and close the CR:
    1. Add volume and mount configuration information. For example, your entry in the Spark application CR should look similar to the following:
      driver:
        ...
        volumeMounts:
          - name: maprfs-volume
            mountPath: /maprfs-csi
      ...
      volumes:
      - name: maprfs-volume
        persistentVolumeClaim:
          claimName: mapr-csi-pvc
    2. Use local://schema to point to the files. For example, you can specify maprfs:///apps/py-test/main.py as local:///[mountPath]/apps/py-test/main.py. That is, your entry in the Spark application CR should look similar to the following:
      ...
      mainApplicationFile: local:///maprfs-csi/<path_to_main_file>/app.py
      deps:
        pyFiles:
        - local:///maprfs-csi/<path_to_file>/dependency.py
      For a complete sample CR, see the mapr-spark-py-with-dependencies.yaml in the examples/spark/ directory.
  7. Apply the Spark application CR. For more information, see Deploying the Spark Application.