Deploying the MapR Data Platform on Azure Container Service with Kubernetes Orchestrator

Contributed by
& Rafael Godinho

15 min read

Introduction

Big data developers and QA professionals need a robust big data platform where they can concentrate their efforts on software development and code testing before rolling out to production. However, getting access to test and staging environments can be challenging, as these are often not self-enabled and require IT assistance. Because of this gap, time-to-market could be adversely affected and product life cycles become too long to adapt to today’s speed of business.

Fortunately, containerized service offers a solution to narrow this gap. In my previous post, I offered a way to spin up a mini containerized MapR cluster in a single virtual instance. That works fine for single user environments but is not scalable. What if you have a team of developers who want to collaborate on the very same containerized MapR cluster? A single virtual instance will not be able to satisfy the need.

Introducing Azure Container Service (ACS). It is an Azure service that makes it simpler to create, configure and manage a cluster of virtual machines that are preconfigured to run containerized applications. It uses an optimized configuration of popular open-source scheduling and orchestration tools, like, Kubernetes, DC/OS and Docker Swarm. It means there is no need to change your existing management practices and tools to move container workloads to Azure, you can keep using the tools you are used to. It is possible to deploy a full-blown MapR cluster in less than an hour. No need to rely on the ever-busy IT professionals to assist you, or consume a large hardware environment.

MapR has been working closely with ACS to make the deployment much easier. In this blog post, I will walk you through the necessary steps to deploy the MapR Data Platform on ACS and demonstrate the capabilities of the MapR Persistent Application Client Container (PACC) for deploying your containerized applications that leverage the MapR Platform as a persistence tier. Note that the described configuration is not supported by MapR and thus should not be used for production deployments, and should only be used for test, demo, training or development environments.

Prerequisites

Before you start, please set up an account on Azure. You can sign up for one here. Additionally, install Docker on your computer or a cloud instance by following this instruction here. That’s it, now you are ready to start deploying!

Step 1 – Download a pre-built container and start it and login to Azure

On the computer or cloud instance where you installed Docker, run the following command:

docker run --name azm --hostname azm -it  maprazure/azm:latest /bin/bash

Now you are in the azm container, and at the prompt, you need to login to your Azure account. This container already has Azure CLI 2.0 installed, you can find more information about it on the following documentation page Get started with Azure CLI 2.0. Below we have a quick summary how to login to your Azure account:

[root@azm /]# az login

Follow the instruction to complete the login process: Example: To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code BT376Q5W8 to authenticate.

In a short moment, you should get the prompt back after you login successfully.

Step 2 – Deploy a Kubernetes cluster

To deploy Kubernetes cluster, you need to execute the “deploy-k8” command. You can provide various options. To view these options, specify the “-h” option for help menu. [root@azmaster payload]# deploy-k8 –h

Usage: deploy-k8 [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -g GNAME, --resource-group=GNAME
                        Azure Resource Group Name. default: myacs
  -d DNS_PREFIX, --dns-prefix=DNS_PREFIX
                        DNS Prefix for Kubernetes Hosts.
  -l LOC, --location=LOC
                        Azure Region, e.g. westus, eastus, etc. default:
                        eastus
  -a APPNAME, --app-name=APPNAME
                        Azure Application Name. default: mykubecluster
  -p APPPASSWORD, --app-password=APPPASSWORD
                        Azure Application Password
  -s VMSIZE, --vm-size=VMSIZE
                        VM size of the Kubernetes agents. default:
                        Standard_D2_v2
  -c AGENTCOUNT, --agent-count=AGENTCOUNT
                        Number of the Kubernetes agents. default: 3
  -q, --quiet           don't print status messages to stdout

There are default values for each option. At the minimum, you should specify the password and DNS prefix while executing the command to deploy a Kubernetes cluster, this password is a key required by Kubernetes application to authenticate with Azure infrastructure.

Deploying Mapr Convergence on Azure Container Service

For example:

[root@azm ~]# deploy-k8 -p M@prtest1 -d myk8

In about 10 minutes, you will get the shell prompt back. This means that the Kubernetes cluster is deployed. Now go to the Azure portal (http://portal.azure.com), select Resource Groups, assuming you didn’t specify the resource group name, the default resource group myacs is listed as below.

Select myacs and you will see it includes quite a few resources including virtual machines, load balancers and storage, etc. By default, there are one Kubenetes master and 3 agents created and their VM sizes are Standard_D2_v2.

Deploying Mapr Convergence on Azure Container Service

Step 3 - Login to Kubernetes master node and deploy MapR Data Platform

On the azm container, issue “ssh-master” command:

[root@azm ~]# ssh-master

This will get you login to Kubernetes master node, to check if the Kubernetes cluster is ready, issue this command at the prompt, for example:

root@k8s-master-C5E9779-0:~# kubectl get nodes
NAME                   STATUS                     AGE
k8s-agent-c5e9779-0    Ready                      5m
k8s-agent-c5e9779-1    Ready                      5m
k8s-agent-c5e9779-2    Ready                      5m
k8s-master-c5e9779-0   Ready,SchedulingDisabled   5m

You can see the output indicates that one master and 3 agents are ready. Now, you can move forward to deploy a MapR cluster by issuing “deploy-mapr” command, again, -h option gives you the help menu:

root@k8s-master-C5E9779-0:~# deploy-mapr -h
Usage: deploy-mapr [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  --maprv=MAPRV, --mapr-version=MAPRV
                        MapR version. default: 520
  --mep=MEP             MEP version. default: 2.0
  -c CLNAME, --cluster-name=CLNAME
                        MapR cluster name. default: mapr520
  -n NNODES, --mapr-nodes=NNODES
                        MapR cluster size. default: 3
  -a ADMIN, --admin-user=ADMIN
                        MapR admin username. default: mapruser
  -p PASSWD, --admin-password=PASSWD
                        MapR admin user password
  -s MODE, --security-mode=MODE
                        MapR security mode: base, sec or kdc. default: base
  -d LDAPUSER, --ldap-user=LDAPUSER
                        MapR ldap username. default: ldapuser
  -q, --quiet           don't print status messages to stdout

At the minimum, you should provide an admin password to manage MapR, for example:

root@k8s-master-C5E9779-0:~# deploy-mapr -p M@prtest1

This will kick off the MapR installation, by default, there will be 3 MapR containers deployed along with a LDAP container for user directory lookup, a Metastore container for Apache Hive, a MapR client container, a squid proxy container, and a cockpit container that is used to visualize and manage the Kubernetes cluster.

About half way through, you will see messages like the following, this indicates that Kubernetes is configuring Azure load balancer so you can access cockpit portal from the internet, for example:

Waiting for load balancer to open up cockpit port – 5 seconds, est. 5 min
Waiting for load balancer to open up cockpit port – 10 seconds, est. 5 min
……..
Waiting for load balancer to open up cockpit port - 185 seconds, est. 5 min
Waiting for load balancer to open up cockpit port - 190 seconds, est. 5 min
Please point your browser's at http://13.64.77.133:9090 for cockpit access.

Now point your browser at the URL, you should see the cockpit portal, login as root with the password you provided for MapR admin above.

Now point your browser at the URL, you should see the cockpit portal, login as root with the password you provided for MapR admin above.

Deploying Mapr Convergence on Azure Container Service

Once you are logged in, you will see the console as below, click on “Cluster” tab on the top. You will then further click on “Topology” on the left pane, and select the “mapr520” in the Project drop-down menu. See below graphs.

Deploying Mapr Convergence on Azure Container Service Deploying Mapr Convergence on Azure Container Service

Now you should be seeing the animation of your MapR cluster being deployed on Kubernetes. The black circles are the VMs in Kubernetes cluster, the blue circles are the MapR containers that are being spun up.

Wait till the MapR deployment is finished. You should see something similar to below messages on your Kubernetes master console:

<snip>
Waiting for load balancer to open up proxy port - 40 seconds, est. 5 min
Waiting for load balancer to open up proxy port - 45 seconds, est. 5 min
Waiting for load balancer to open up proxy port - 50 seconds, est. 5 min
Waiting for load balancer to open up proxy port - 55 seconds, est. 5 min
All Done!!
===============================================
Please point your browser's at http://13.64.77.136:9090 for cockpit access.

Please configure your browser's proxy setting to IP: 13.64.116.25 and Port: 30128
and then point your browser at https://mapr520-node0:8443 to access MCS
You can also point your browser at http://mapr520node0:8047 to access Apache Drill
===============================================

To get inside these containers, click on “Containers” in the left pane and highlight your desired container (e.g. mapr-client).

Deploying Mapr Convergence on Azure Container Service

Once you are in the mapr-client container, you can issue certain commands such as:

  1. df, you will see a /posix mount point, this is the mount point that allows you interact with MapR XD using your POSIX compliant Linux commands.
  2. hadoop fs –ls /, this is your all too familiar command for showing the MapR XD contents
  3. id ldapuser, note that we have spun up a LDAP container for centralized username lookup, you should see the uid, gid of user ldapuser and it is not in the local /etc/passwd file.

Deploying Mapr Convergence on Azure Container Service

Step 4 - Configure your browser to access MapR Control System (MCS)

We have also deployed a squid proxy container that allows you to access the MapR cluster. To do this, open up your browser’s proxy setting, Firefox browser is used as an example here shown as below, fill in the HTTP proxy field with the proxy setting IP in step 3 above, then type in port 30128. Click OK. Examples are shown in the following two graphs.

Deploying Mapr Convergence on Azure Container Service

Deploying Mapr Convergence on Azure Container Service

Now point your browser at MCS https://mapr520-node0:8443, login as user mapr with admin password you provided in step 3. You should be able to start managing the MapR cluster from MCS portal. The cluster comes with a basic license that is sufficient to get you started, however, if you wish to explore the full features in MapR Data Platform such as MapR Database and MapR Event Store, you will have to install a free 30-day unlimited trial license. You can do so by following the section under “Apply the Trial License” in my previous blog post.

Deploying Mapr Convergence on Azure Container Service Deploying Mapr Convergence on Azure Container Service

Apache Drill is also available by pointing your browser at http://mapr520-node0:8047.

Deploying Mapr Convergence on Azure Container Service

Step 5 – Deploy PACC services

Tug Grall wrote a great blog post regarding how to start using MapR PACC service. Basically it uses a sensor PACC that collects its host’s performance stats (IO, memory, cpu load, etc) and publishes them to a MapR Event Store topic, a webserver PACC then consumes these stream messages and displays them with HTTP so you can view them with a browser.

I have prepared a script in /opt/pacc on the Kubernetes master node, execute it as follows:

root@k8s-master-238E7C1E-0# cd /opt/pacc

root@k8s-master-238E7C1E-0:/opt/pacc# bash deploy_pacc

deployment "sensor-deploy" created
deployment "websr-deploy" created
service "mapr-pacc-svc" created
Waiting for load balancer to open up cockpit port - 5 seconds, est. 5 min
Waiting for load balancer to open up cockpit port - 10 seconds, est. 5 min
Waiting for load balancer to open up cockpit port - 15 seconds, est. 5 min
…….
Waiting for load balancer to open up cockpit port - 90 seconds, est. 5 min
Waiting for load balancer to open up cockpit port - 95 seconds, est. 5 min
PACC deployment Done...
point your browser at http://40.83.251.214

In the browser, you can see these stats are refreshed every 3 seconds as they are published and consumed in real time.

Deploying Mapr Convergence on Azure Container Service

Step 6 – Scale your PACC deployment according to demand

One very nice feature of Kubernetes is the capability to scale up and down your services dynamically according to the demands, no down time required. In your cockpit window, there are one sensor container and two web server containers got spun up with the script in previous step. You can see the web servers are spread across two VMs (black circles) for high availability purpose. Both of them are attached to a service (orange circle) to achieve balancing. The service is associated with a public internet IP address on the Azure load balancer to allow external access.

Deploying Mapr Convergence on Azure Container Service

Now suppose the demand suddenly peaks and you need more web server containers to serve the load. Issue the following command on the Kubernetes master node to scale the number of web servers from 2 to 8:

root@k8s-master-238E7C1E-0:/opt/pacc# kubectl --namespace=pacc scale deployment websr-deploy --replicas=8

In a few seconds, you should see 6 more web servers popping up in the cockpit window to handle higher load. Kubernetes also makes sure that these new containers are distributed across the hosts (black circles) as evenly as possible. Note that on the other window where it is displaying the stats continues without any disruptions.

Deploying Mapr Convergence on Azure Container Service

Lastly, in any case should you want to re-deploy the cluster, you can issue this command to remove the existing cluster first, e.g. kubectl delete namespace mapr520, and then do “deploy-mapr –n to re-deploy.

Summary

We have demonstrated how to spin up a MapR cluster on the Azure Container Service Platform and how to manage PACC with Kubernetes orchestration. The benefit for this solution is a faster time-to-market software development cycle. Software developer can confidently test their codes in this environment before releasing to production.

If you feel this article is of interest to you, please find out more about MapR at mapr.com or ask technical questions at our community forum community.mapr.com.


This blog post was published April 04, 2017.
Categories

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.


Get our latest posts in your inbox

Subscribe Now