10 min read
In the third blog of this trilogy, we will be launching an application and services on our Kubernetes cluster, which will leverage the MapR Data Platform as the persistent data store for your containers. At the bottom of this post, you will find two video demonstrations that leverage the MapR Volume Driver Plugin for Kubernetes.
This blog assumes you have finished the steps from the previous blogs.
We will be launching a containerized PostgreSQL Database Server on Kubernetes, which will leverage MapR as its persistent store for the actual database files. Using Kubernetes with MapR as the persistent store allows for easily maintaining the business SLAs, as Kubernetes takes care of the container failover while MapR guarantees the high availability of the PostgreSQL database files.
All files required are in a GitHub project, so let's start with cloning that project:
# Clone the github repository yum install -y git git clone https://github.com/mkieboom/mapr-k8-postgres
For readability purposes, the Kubernetes configuration YAML file has been split into two parts. Open the first YAML file to understand what is happening in that configuration file as well as make some mandatory changes to connect Kubernetes to your MapR cluster environment:
# Configure the Kubernetes yaml files based on the information below cd mapr-k8-postgres vi mapr-k8-postgres-part1-volumedriver.yaml
The YAML file starts with the creation of the 'mapr-apps' namespace in Kubernetes:
# MapR Apps Namespace --- apiVersion: v1 kind: Namespace metadata: name: mapr-apps labels: name: mapr-apps
Next, it will create a StorageClass in Kubernetes. This Kubernetes StorageClass will point towards your MapR Data Platform environment. Make sure to modify at least the 'restServers,' 'cldbHosts,' and 'cluster' parameters, so it reflects your MapR environment. Additionally, you can modify the other parameters to set, for example, the replication level, and enable or disable auditing, quota, and more:
# StorageClass for MapR Data Platform --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: maprfs-sc-postgres namespace: mapr-apps provisioner: mapr.com/maprfs parameters: # Configure below MapR cluster details to reflect your MapR cluster configuration restServers: "172.16.1.61:8443" cldbHosts: "172.16.1.61" cluster: "demo.mapr.com" securityType: "unsecure" maprSecretName: "mapr-provisioner-secrets" maprSecretNamespace: "mapr-apps" namePrefix: "postgres" mountPrefix: "/postgres" reclaimPolicy: “Retain” advisoryquota: "100M" auditenabled: "1" forceauditenable: "1" replication: "3" minreplication: "2" nsreplication: "3" nsminreplication: "2" type: "rw" mount: "1"
With the StorageClass, we can create a PersistentVolumeClaim that references the StorageClass. One of the parameters to be configured is the maximum storage limit of the MapR Volume that will be created. In this example, we will dynamically create a volume of 300MB:
# PersistentVolumeClaim for MapR Data Platform --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: maprfs-pvc-postgres namespace: mapr-apps spec: accessModes: - ReadWriteOnce storageClassName: maprfs-sc-postgres resources: requests: storage: 300M
Finally, it will create a Kubernetes Secret to store the MapR cluster authentication username and password. This allows Kubernetes to connect and manage the MapR cluster:
# Secret to authenticate with MapR Data Platform --- apiVersion: v1 kind: Secret metadata: name: mapr-provisioner-secrets namespace: mapr-apps type: Opaque data: # base64 encoding: echo -n 'mapr' | base64 MAPR_CLUSTER_USER: "bWFwcg==" MAPR_CLUSTER_PASSWORD: "bWFwcg=="
Next step is to configure the Kubernetes Service and Pod in the second YAML file:
# Configure the Kubernetes yaml files based on the information below cd mapr-k8-postgres vi mapr-k8-postgres-part2-container.yaml
Using a Kubernetes Service, we can expose ports from the Pod to the outside world. For the PostgreSQL Database Server, this means that we can expose the database server port 5432 to a port in the 30000+ range:
# Service --- apiVersion: v1 kind: Service metadata: name: mapr-k8-postgres namespace: mapr-apps spec: type: NodePort selector: app: mapr-k8-postgres ports: - protocol: TCP port: 5432 targetPort: 5432 nodePort: 30003
Final configuration to be made is the actual Pod itself. The container will use PersistentVolumeClaim as a volume mount inside the container. Also modify the environment variables (env) to reflect your MapR user on your MapR cluster.
# Pod --- apiVersion: v1 kind: Pod metadata: name: mapr-k8-postgres namespace: mapr-apps labels: app: mapr-k8-postgres spec: containers: - name: mapr-k8-postgres imagePullPolicy: Always image: mkieboom/mapr-k8-postgres resources: requests: memory: "2Gi" cpu: "500m" command: - /bin/bash - -c - "/launch.sh" env: - name: PGDATA_LOCATION value: "/postgres" - name: PG_DB value: "mapr" - name: PG_GROUP value: "mapr" - name: PG_USER value: "mapr" - name: PG_PWD value: "mapr" - name: PG_GID value: "5000" - name: PG_UID value: "5000" ports: - containerPort: 5432 volumeMounts: - name: maprfs-pvc mountPath: "/postgres" volumes: - name: maprfs-pvc persistentVolumeClaim: claimName: maprfs-pvc-postgres
With everything configured properly, it's time to load the configuration files into your Kubernetes cluster to deploy the PostgreSQL container:
# Deploy the dynamic volume creation. # Check the MapR Control System and notice the volume created kubectl create -f mapr-k8-postgres-part1-volumedriver.yaml # Deploy the PostgreSQL container: kubectl create -f mapr-k8-postgres-part2-container.yaml
Use the Kubernetes Dashboard to check the deployment status. Once the deployment has pulled in the Docker container and runs the PostgreSQL Database Server, we can use a SQL client to connect to the database server, for example:
# Install psql client on any machine yum install -y postgresql # Use psql client to connect to the database server (username/password: mapr/mapr) # Connect to the PostgreSQL container by specifying a Kubernetes node as the host: psql -U mapr -h k8snode01 -p 30003 # Run some basic SQL testing against the PostgreSQL container CREATE SCHEMA test; CREATE TABLE test.test (coltest varchar(20)); insert into test.test (coltest) values ('It works!'); SELECT * from test.test;
Once data has been ingested into the database, let's simulate what happens when we would shutdown and relaunch the container. There are various ways to eliminate the running PostgreSQL container. You can either:
_kubectl delete -f mapr-k8-postgres-part2-container.yaml
kubectl create -f mapr-k8-postgres-part2-container.yaml_
Based on the age, restart, and node on which the container is running, you can validate if the container indeed restarted:
Kubernetes will automatically relaunch the container. Once the container is running again, you can relaunch a psql client to validate that the data is still available and safeguarded, as it is persistent on the MapR Data Platform:
# Use psql client to connect to the database server (username/password: mapr/mapr) # Connect to the PostgreSQL container by specifying a Kubernetes node as the host: psql -U mapr -h k8snode01 -p 30003 # Validate that the data is still available in the PostgreSQL container: SELECT * from test.test;
With that, we've achieved a automatic failover of a PostgreSQL Database Server, guaranteeing business SLAs, while in parallel we can run any other containerized legacy or innovative application and/or business service on the same platform.
In this blog thrilogy, you have experienced how the combination of Kubernetes and MapR allows for both data and application portability. Specifically, we looked into how we can run a containerized Postgres server container, which stores its data on the MapR Data Platform. Where the actual data is physically stored is completely transparent for the application container, as that is handled by the MapR Volume Driver Plugin for Kubernetes.
We also simulated an application container failure and how to recover from such a scenario. As the Postgres data is persisted to the MapR Data Platform, a container failure has zero impact to the actual business data.
Allowing Kubernetes to simply and automatically restart the failed application container allows IT departments to easily maintain business SLAs.
In addition, combined with the demo shown in the videos below, running application servers like Postgres in containers can easily be mixed with containers running classic ETL processes as well as running new technologies like machine learning, using TensorFlow, for example. This clearly demonstrates the powerful combination of Kubernetes and the MapR Data Platform.
In this video, I will demonstrate how to classify high resolution images using containerized applications. The video showcases bringing in an image, which can be from datacenter, edge, or public cloud, and running TensorFlow on Kubernetes to classify the images. Once the output images are available, they are stored on the MapR Data Platform.
In this video, I will demonstrate how you can dynamically and on-demand create a storage volume on MapR from Kubernetes. Secondly, I will explain how to launch a PostgreSQL Database Server container on Kubernetes, leveraging the dynamically created volume on MapR. Finally, I will show how MapR and Kubernetes jointly provide failover, recovery, and data retention for the PostgreSQL Database Server container.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.