Dataware for data-driven transformation

Installing Kubeflow with MapR

Contributed by

7 min read

In my previous blog in this series, Kubernetized Machine Learning and AI Using Kubeflow, I covered the Kubeflow project and how it integrates with and complements the MapR Data Platform.

Kubeflow is an application deployment framework and software repo for machine learning toolkits that run in Kubernetes.

In Kubeflow, Kubernetes namespaces are used to provide workflow isolation and per-tenant compute allocation capabilities. When combined with the global namespace and unified security capabilities provided by MapR, Kubeflow + MapR provides a fully comprehensive, multi-tenant environment for machine learning and AI applications.

Preparing to Install Kubeflow

This tutorial assumes that you have a working Kubernetes or Minikube cluster and the MapR Volume Plugin is installed.

For steps to set these items up, I suggest the following resources:

The versions that are used for the purposes of this tutorial are:

  • Kubernetes v1.12.2
  • Kubeflow 0.3.3
  • Docker v1.13.1
  • ksonnet 0.13.0

The full steps are also documented here in GitHub: https://github.com/rsilvery/kf_mapr

Create the namespace in Kubernetes that you intend to use for Kubeflow components (e.g., “kubeflow”):

kubectl create ns kubeflow

Configure Data Access from Kubeflow Namespace to MapR Data Platform

The components that need to be deployed in order to access a volume in the MapR Data Platform are:

Creating the Secret

To create the Kubernetes Secret, you first need to get a long-lived service ticket from the MapR cluster, which contains the volumes that you plan to mount to your Kubernetes pods. Steps for generating this ticket can be found here.

Next, you need to take the string in the ticket and Base64 encode it. There are command line tools for this or you can simply use a webtool like this one.

Download this secret template and place the Base64 encoded string into the CONTAINER_TICKET field as shown:

CONTAINER_TICKET: [base64 encoded ticket]

Create the ticket, using kubectl:

kubectl create -f kf-secret.yaml

Creating the Persistent Volume and Persistent Volume Claim

Download this Persistent Volume template and edit the following fields so that they point to your MapR cluster:

cluster: "[CLUSTER NAME: ex. my.cluster.com]"
cldbHosts: "[CLDB hosts]”
volumePath: "[Path to mnt: ex. /user/mapr/]"

Create the Persistent Volume using kubectl:

kubectl create -f claim-admin-pv.yaml

Create the Persistent Volume Claim using kubectl:

kubectl create -f https://raw.githubusercontent.com/rsilvery/kf_mapr/master/claim-admin-pvc.yaml

Install ksonnet

Ksonnet is the mechanism by which Kubeflow configurations are deployed to Kubernetes. This version of ksonnet requires a version of Go that is newer than 1.9, and steps to install this are in my GitHub doc.

Download and extract the latest version of ksonnet and create a symbolic link in /usr/local/bin:

wget https://github.com/ksonnet/ksonnet/releases/download/v0.13.0/ks_0.13.0_linux_amd64.tar.gz
tar -xzvf ks_0.13.0_linux_amd64.tar.gz
ln -s ks_0.13.0_linux_amd64/ks /usr/local/bin/

Install Kubeflow

There are scripts available for installing Kubeflow here. The reason that I’m providing manual steps is that I’ve found that when you’re not using GCE, it’s hard to debug issues that come up in the scripted install. Additionally, I like to manually select the components that I install.

There are primarily 4 stages to go through when installing Kubeflow:

  1. Setting up a local repo and downloading the Kubeflow components to it
  2. Selecting the items that you want to install and setting this in the ksonnet app
  3. Configuring all components and generating the manifests needed for deployment
  4. Pushing this configuration into your Kubernetes cluster

Before proceeding, please set the following environment variables:

export K8S_NAMESPACE=kubeflow
export DEPLOYMENT_NAME=kubeflow
export KUBEFLOW_VERSION=0.3.3
export KUBEFLOW_TAG=v${KUBEFLOW_VERSION}
export KUBEFLOW_DEPLOY=true
export KUBEFLOW_REPO=`pwd`/kubeflow/
export KUBEFLOW_KS_DIR=`pwd`/${DEPLOYMENT_NAME}_ks_app

Setting up local Kubeflow repo

Clone the repo to local directory:

curl https://raw.githubusercontent.com/kubeflow/kubeflow/${KUBEFLOW_TAG}/scripts/download.sh | bash

Configure ksonnet app and select components for install

Initialize ksonnet app:

cd $(dirname "${KUBEFLOW_KS_DIR}")
ks init $(basename "${KUBEFLOW_KS_DIR}")

Add local Kubeflow registry to ksonnet app:

cd $KUBEFLOW_KS_DIR
ks registry add kubeflow "${KUBEFLOW_REPO}"

Set default namespace for ksonnet app:

ks env set default --namespace $K8S_NAMESPACE

Select components for install. You can find a list of available components from the repo by running “ks pkg list”:

ks pkg install kubeflow/core
ks pkg install kubeflow/tf-serving
ks pkg install kubeflow/examples

Generate manifests for Kubeflow components

ks generate ambassador ambassador
ks generate jupyterhub jupyterhub --namespace ${K8S_NAMESPACE}
ks generate centraldashboard centraldashboard
ks generate tf-job-operator tf-job-operator --namespace ${K8S_NAMESPACE}
ks generate argo argo --namespace ${K8S_NAMESPACE}
ks generate seldon seldon --namespace ${K8S_NAMESPACE}

Configure and deploy environment to Kubernetes

ks env add cloud
ks env set cloud --namespace ${K8S_NAMESPACE}
kubectl config set-context $(kubectl config current-context) --namespace=$K8S_NAMESPACE
ks apply cloud

And that’s nearly it! Kubeflow is now configured, and components should be up and running.

In order to access the WebUIs frequently, you need to configure an ingress service. What this does is dynamically route requests that occur on a particular port to the UI node’s internal IP address.

Here is an example YAML file that will do this for the JupyterHub UI (fill in the External IP field):

apiVersion: v1
kind: Service
metadata:
  name: jupyter-svc
  namespace: kubeflow
  labels:
    app: tf-hub
spec:
  ports:
  - port: 8000
    protocol: TCP
    name: jupyterhub
  externalIPs:
  - <External Host IP>
  selector:
    app: tf-hub

Summary

Hopefully, this tutorial has allowed you to get up and running with Kubeflow, using data stored in MapR. Come listen to my presentation on “Persistent Storage for Machine Learning in Kubeflow” at Strata San Francisco for more information.

In the next iteration, we will work through a simple use case that highlights how all of these components can be strung together into an ML workflow.


This blog post was published January 14, 2019.
Categories

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.


Get our latest posts in your inbox

Subscribe Now