Setting Affinity for Spark Pods Using Node Labels

You can set up the Spark driver and executor to run on particular nodes by using the pod affinity feature. Labeling nodes allows you to choose the node on which a pod will be hosted. For more information, see Using Pod Affinity. The examples/spark/ directory contains a sample Spark application CR, mapr-spark-pi-affinity.yaml, with pod affinity.

For example, suppose there are five nodes in your cluster, similar to the following:
$ kubectl get nodes
NAME              STATUS    ROLES     AGE  VERSION
mycluster-node1   Ready     <none>    17m  v1.16.2
mycluster-node2   Ready     <none>    17m  v1.16.2
mycluster-node3   Ready     <none>    17m  v1.16.2
mycluster-node4   Ready     <none>    17m  v1.16.2
mycluster-node5   Ready     <none>    17m  v1.16.2
To set up the Spark driver and executor to run on the pods on specific nodes, do the following:
  1. Label the nodes on which you want to run the Spark driver and executor by using commands similar to the following:
    $ kubectl label nodes mycluster-node1 compute=node1
    $ kubectl label nodes mycluster-node2 compute=node2
  2. Set the value for the affinity property in the Spark application CR. For example:
    spec:
      driver:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: compute
                  operator: In
                  values:   
                  - node1
      executor:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: compute
                  operator: In
                  values:   
                  - node2
  3. Verify that the driver and executor are running on specific nodes by using the following command:
    kubectl get pods -n <cspace> -o wide
    
    For example, the following output shows the driver running on compute:node1 and executor running on compute:node2 as configured in the previous steps:
    spark-pi-1556618823373-exec-1  1/1  Running 0  8s  10.0.0.14   mycluster-node2
    spark-pi-1556618823373-exec-2  1/1  Running 0  7s  10.0.0.15   mycluster-node2
    spark-pi-driver                1/1  Running 0  35s 10.0.1.15   mycluster-node1