Kubernetes Tutorial: How to Install and Deploy Applications at Scale on K8s - Part 3 of 3

Contributed by

10 min read

Editor’s Note: Part 1 in this series and Part 2 in this series


In the previous two blogs, we looked at the business benefits of running MapR and Kubernetes as well as installed a Kubernetes cluster and deployed the MapR Volume Driver Plugin for Kubernetes.

In the third blog of this trilogy, we will be launching an application and services on our Kubernetes cluster, which will leverage the MapR Data Platform as the persistent data store for your containers. At the bottom of this post, you will find two video demonstrations that leverage the MapR Volume Driver Plugin for Kubernetes.

This blog assumes you have finished the steps from the previous blogs.

PostgreSQL Database Server on Kubernetes and MapR

We will be launching a containerized PostgreSQL Database Server on Kubernetes, which will leverage MapR as its persistent store for the actual database files. Using Kubernetes with MapR as the persistent store allows for easily maintaining the business SLAs, as Kubernetes takes care of the container failover while MapR guarantees the high availability of the PostgreSQL database files.

All files required are in a GitHub project, so let's start with cloning that project:

# Clone the github repository
yum install -y git
git clone https://github.com/mkieboom/mapr-k8-postgres

For readability purposes, the Kubernetes configuration YAML file has been split into two parts. Open the first YAML file to understand what is happening in that configuration file as well as make some mandatory changes to connect Kubernetes to your MapR cluster environment:

# Configure the Kubernetes yaml files based on the information below
cd mapr-k8-postgres

vi mapr-k8-postgres-part1-volumedriver.yaml

The YAML file starts with the creation of the 'mapr-apps' namespace in Kubernetes:

# MapR Apps Namespace
apiVersion: v1
kind: Namespace
  name: mapr-apps
    name: mapr-apps

Next, it will create a StorageClass in Kubernetes. This Kubernetes StorageClass will point towards your MapR Data Platform environment. Make sure to modify at least the 'restServers,' 'cldbHosts,' and 'cluster' parameters, so it reflects your MapR environment. Additionally, you can modify the other parameters to set, for example, the replication level, and enable or disable auditing, quota, and more:

# StorageClass for MapR Data Platform
apiVersion: storage.k8s.io/v1
kind: StorageClass
   name: maprfs-sc-postgres
   namespace: mapr-apps
provisioner: mapr.com/maprfs
    # Configure below MapR cluster details to reflect your MapR cluster configuration
    restServers: ""
    cldbHosts: ""
    cluster: "demo.mapr.com"
    securityType: "unsecure"
    maprSecretName: "mapr-provisioner-secrets"
    maprSecretNamespace: "mapr-apps"
    namePrefix: "postgres"
    mountPrefix: "/postgres"
    reclaimPolicy: “Retain”
    advisoryquota: "100M"
    auditenabled: "1"
    forceauditenable: "1"
    replication: "3"
    minreplication: "2"
    nsreplication: "3"
    nsminreplication: "2"
    type: "rw"
    mount: "1"

With the StorageClass, we can create a PersistentVolumeClaim that references the StorageClass. One of the parameters to be configured is the maximum storage limit of the MapR Volume that will be created. In this example, we will dynamically create a volume of 300MB:

# PersistentVolumeClaim for MapR Data Platform
kind: PersistentVolumeClaim
apiVersion: v1
  name: maprfs-pvc-postgres
  namespace: mapr-apps
    - ReadWriteOnce
  storageClassName: maprfs-sc-postgres
      storage: 300M

Finally, it will create a Kubernetes Secret to store the MapR cluster authentication username and password. This allows Kubernetes to connect and manage the MapR cluster:

# Secret to authenticate with MapR Data Platform
apiVersion: v1
kind: Secret
  name: mapr-provisioner-secrets
  namespace: mapr-apps
type: Opaque
# base64 encoding: echo -n 'mapr' | base64

Next step is to configure the Kubernetes Service and Pod in the second YAML file:

# Configure the Kubernetes yaml files based on the information below
cd mapr-k8-postgres

vi mapr-k8-postgres-part2-container.yaml

Using a Kubernetes Service, we can expose ports from the Pod to the outside world. For the PostgreSQL Database Server, this means that we can expose the database server port 5432 to a port in the 30000+ range:

# Service
apiVersion: v1
kind: Service
  name: mapr-k8-postgres
  namespace: mapr-apps
  type: NodePort
    app: mapr-k8-postgres
  - protocol: TCP
    port: 5432
    targetPort: 5432
    nodePort: 30003

Final configuration to be made is the actual Pod itself. The container will use PersistentVolumeClaim as a volume mount inside the container. Also modify the environment variables (env) to reflect your MapR user on your MapR cluster.

# Pod
apiVersion: v1
kind: Pod
  name: mapr-k8-postgres
  namespace: mapr-apps
    app: mapr-k8-postgres
    - name: mapr-k8-postgres
      imagePullPolicy: Always
      image: mkieboom/mapr-k8-postgres
          memory: "2Gi"
          cpu: "500m"
      - /bin/bash
      - -c
      - "/launch.sh"
        - name: PGDATA_LOCATION
          value: "/postgres"
        - name: PG_DB
          value: "mapr"
        - name: PG_GROUP
          value: "mapr"
        - name: PG_USER
          value: "mapr"
        - name: PG_PWD
          value: "mapr"
        - name: PG_GID
          value: "5000"
        - name: PG_UID
          value: "5000"
        - containerPort: 5432
        - name: maprfs-pvc
          mountPath: "/postgres"
    - name: maprfs-pvc
        claimName: maprfs-pvc-postgres

With everything configured properly, it's time to load the configuration files into your Kubernetes cluster to deploy the PostgreSQL container:

# Deploy the dynamic volume creation.
# Check the MapR Control System and notice the volume created
kubectl create -f mapr-k8-postgres-part1-volumedriver.yaml

# Deploy the PostgreSQL container:
kubectl create -f mapr-k8-postgres-part2-container.yaml

Use the Kubernetes Dashboard to check the deployment status. Once the deployment has pulled in the Docker container and runs the PostgreSQL Database Server, we can use a SQL client to connect to the database server, for example:

# Install psql client on any machine  
yum install -y postgresql

# Use psql client to connect to the database server (username/password: mapr/mapr)
# Connect to the PostgreSQL container by specifying a Kubernetes node as the host:
psql -U mapr -h k8snode01 -p 30003

# Run some basic SQL testing against the PostgreSQL container
CREATE TABLE test.test (coltest varchar(20));
insert into test.test (coltest) values ('It works!');
SELECT * from test.test;

Once data has been ingested into the database, let's simulate what happens when we would shutdown and relaunch the container. There are various ways to eliminate the running PostgreSQL container. You can either:

  1. execute:
    _kubectl delete -f mapr-k8-postgres-part2-container.yaml
    kubectl create -f mapr-k8-postgres-part2-container.yaml_
    Which will recreate the container,
  2. Run a docker kill command on the PostgreSQL container, or
  3. Shut down the Kubernetes node that the container is running on. (Make sure you don't shut down the Kubernetes master node.)

Based on the age, restart, and node on which the container is running, you can validate if the container indeed restarted:


Kubernetes will automatically relaunch the container. Once the container is running again, you can relaunch a psql client to validate that the data is still available and safeguarded, as it is persistent on the MapR Data Platform:

# Use psql client to connect to the database server (username/password: mapr/mapr)
# Connect to the PostgreSQL container by specifying a Kubernetes node as the host:
psql -U mapr -h k8snode01 -p 30003

# Validate that the data is still available in the PostgreSQL container:
SELECT * from test.test;

With that, we've achieved a automatic failover of a PostgreSQL Database Server, guaranteeing business SLAs, while in parallel we can run any other containerized legacy or innovative application and/or business service on the same platform.


In this blog thrilogy, you have experienced how the combination of Kubernetes and MapR allows for both data and application portability. Specifically, we looked into how we can run a containerized Postgres server container, which stores its data on the MapR Data Platform. Where the actual data is physically stored is completely transparent for the application container, as that is handled by the MapR Volume Driver Plugin for Kubernetes.

We also simulated an application container failure and how to recover from such a scenario. As the Postgres data is persisted to the MapR Data Platform, a container failure has zero impact to the actual business data.

Allowing Kubernetes to simply and automatically restart the failed application container allows IT departments to easily maintain business SLAs.

In addition, combined with the demo shown in the videos below, running application servers like Postgres in containers can easily be mixed with containers running classic ETL processes as well as running new technologies like machine learning, using TensorFlow, for example. This clearly demonstrates the powerful combination of Kubernetes and the MapR Data Platform.

Video 1: Image Classification with Tensorflow on MapR and Kubernetes

In this video, I will demonstrate how to classify high resolution images using containerized applications. The video showcases bringing in an image, which can be from datacenter, edge, or public cloud, and running TensorFlow on Kubernetes to classify the images. Once the output images are available, they are stored on the MapR Data Platform.

video preview image

Video 2: PostgreSQL as a Containerized Application on Kubernetes

In this video, I will demonstrate how you can dynamically and on-demand create a storage volume on MapR from Kubernetes. Secondly, I will explain how to launch a PostgreSQL Database Server container on Kubernetes, leveraging the dynamically created volume on MapR. Finally, I will show how MapR and Kubernetes jointly provide failover, recovery, and data retention for the PostgreSQL Database Server container.


Additional Resources:

This blog post was published April 27, 2018.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now