Installing Custom Packages for PySpark

You can install custom Python packages either by manually installing packages on each node in your MapR cluster or by using Conda. Using Conda allows you to perform the install from your Zeppelin host node without having to directly access your MapR cluster. The topics in this section describe the instructions for each method as well as instructions for Python 2 vs Python 3.

With the Livy interpreter, you can use both Python 2 and Python 3 in your Zeppelin notebook. With the Spark interpreter, you can use only one version of Python.

Important: MapR supports the Python libraries included in the Zeppelin container, but does not support the libraries in custom Python packages. You should use Python versions that match the versions installed on your MapR cluster nodes. Choosing a Zeppelin Docker image OS that matches the OS running in your MapR cluster minimizes library version differences.