15 min read
Big data developers and QA professionals need a robust big data platform where they can concentrate their efforts on software development and code testing before rolling out to production. However, getting access to test and staging environments can be challenging, as these are often not self-enabled and require IT assistance. Because of this gap, time-to-market could be adversely affected and product life cycles become too long to adapt to today’s speed of business.
Fortunately, containerized service offers a solution to narrow this gap. In my previous post, I offered a way to spin up a mini containerized MapR cluster in a single virtual instance. That works fine for single user environments but is not scalable. What if you have a team of developers who want to collaborate on the very same containerized MapR cluster? A single virtual instance will not be able to satisfy the need.
Introducing Azure Container Service (ACS). It is an Azure service that makes it simpler to create, configure and manage a cluster of virtual machines that are preconfigured to run containerized applications. It uses an optimized configuration of popular open-source scheduling and orchestration tools, like, Kubernetes, DC/OS and Docker Swarm. It means there is no need to change your existing management practices and tools to move container workloads to Azure, you can keep using the tools you are used to. It is possible to deploy a full-blown MapR cluster in less than an hour. No need to rely on the ever-busy IT professionals to assist you, or consume a large hardware environment.
MapR has been working closely with ACS to make the deployment much easier. In this blog post, I will walk you through the necessary steps to deploy the MapR Data Platform on ACS and demonstrate the capabilities of the MapR Persistent Application Client Container (PACC) for deploying your containerized applications that leverage the MapR Platform as a persistence tier. Note that the described configuration is not supported by MapR and thus should not be used for production deployments, and should only be used for test, demo, training or development environments.
Before you start, please set up an account on Azure. You can sign up for one here. Additionally, install Docker on your computer or a cloud instance by following this instruction here. That’s it, now you are ready to start deploying!
Step 1 – Download a pre-built container and start it and login to Azure
On the computer or cloud instance where you installed Docker, run the following command:
docker run --name azm --hostname azm -it maprazure/azm:latest /bin/bash
Now you are in the azm container, and at the prompt, you need to login to your Azure account. This container already has Azure CLI 2.0 installed, you can find more information about it on the following documentation page Get started with Azure CLI 2.0. Below we have a quick summary how to login to your Azure account:
[root@azm /]# az login
Follow the instruction to complete the login process: Example: To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code BT376Q5W8 to authenticate.
In a short moment, you should get the prompt back after you login successfully.
Step 2 – Deploy a Kubernetes cluster
To deploy Kubernetes cluster, you need to execute the “deploy-k8” command. You can provide various options. To view these options, specify the “-h” option for help menu. [root@azmaster payload]# deploy-k8 –h
Usage: deploy-k8 [options]
Options: --version show program's version number and exit -h, --help show this help message and exit -g GNAME, --resource-group=GNAME Azure Resource Group Name. default: myacs -d DNS_PREFIX, --dns-prefix=DNS_PREFIX DNS Prefix for Kubernetes Hosts. -l LOC, --location=LOC Azure Region, e.g. westus, eastus, etc. default: eastus -a APPNAME, --app-name=APPNAME Azure Application Name. default: mykubecluster -p APPPASSWORD, --app-password=APPPASSWORD Azure Application Password -s VMSIZE, --vm-size=VMSIZE VM size of the Kubernetes agents. default: Standard_D2_v2 -c AGENTCOUNT, --agent-count=AGENTCOUNT Number of the Kubernetes agents. default: 3 -q, --quiet don't print status messages to stdout
There are default values for each option. At the minimum, you should specify the password and DNS prefix while executing the command to deploy a Kubernetes cluster, this password is a key required by Kubernetes application to authenticate with Azure infrastructure.
[root@azm ~]# deploy-k8 -p M@prtest1 -d myk8
In about 10 minutes, you will get the shell prompt back. This means that the Kubernetes cluster is deployed. Now go to the Azure portal (http://portal.azure.com), select Resource Groups, assuming you didn’t specify the resource group name, the default resource group myacs is listed as below.
Select myacs and you will see it includes quite a few resources including virtual machines, load balancers and storage, etc. By default, there are one Kubenetes master and 3 agents created and their VM sizes are Standard_D2_v2.
Step 3 - Login to Kubernetes master node and deploy MapR Data Platform
On the azm container, issue “ssh-master” command:
[root@azm ~]# ssh-master
This will get you login to Kubernetes master node, to check if the Kubernetes cluster is ready, issue this command at the prompt, for example:
root@k8s-master-C5E9779-0:~# kubectl get nodes NAME STATUS AGE k8s-agent-c5e9779-0 Ready 5m k8s-agent-c5e9779-1 Ready 5m k8s-agent-c5e9779-2 Ready 5m k8s-master-c5e9779-0 Ready,SchedulingDisabled 5m
You can see the output indicates that one master and 3 agents are ready. Now, you can move forward to deploy a MapR cluster by issuing “deploy-mapr” command, again, -h option gives you the help menu:
root@k8s-master-C5E9779-0:~# deploy-mapr -h Usage: deploy-mapr [options] Options: --version show program's version number and exit -h, --help show this help message and exit --maprv=MAPRV, --mapr-version=MAPRV MapR version. default: 520 --mep=MEP MEP version. default: 2.0 -c CLNAME, --cluster-name=CLNAME MapR cluster name. default: mapr520 -n NNODES, --mapr-nodes=NNODES MapR cluster size. default: 3 -a ADMIN, --admin-user=ADMIN MapR admin username. default: mapruser -p PASSWD, --admin-password=PASSWD MapR admin user password -s MODE, --security-mode=MODE MapR security mode: base, sec or kdc. default: base -d LDAPUSER, --ldap-user=LDAPUSER MapR ldap username. default: ldapuser -q, --quiet don't print status messages to stdout
At the minimum, you should provide an admin password to manage MapR, for example:
root@k8s-master-C5E9779-0:~# deploy-mapr -p M@prtest1
This will kick off the MapR installation, by default, there will be 3 MapR containers deployed along with a LDAP container for user directory lookup, a Metastore container for Apache Hive, a MapR client container, a squid proxy container, and a cockpit container that is used to visualize and manage the Kubernetes cluster.
About half way through, you will see messages like the following, this indicates that Kubernetes is configuring Azure load balancer so you can access cockpit portal from the internet, for example:
Waiting for load balancer to open up cockpit port – 5 seconds, est. 5 min Waiting for load balancer to open up cockpit port – 10 seconds, est. 5 min …….. Waiting for load balancer to open up cockpit port - 185 seconds, est. 5 min Waiting for load balancer to open up cockpit port - 190 seconds, est. 5 min Please point your browser's at http://220.127.116.11:9090 for cockpit access. Now point your browser at the URL, you should see the cockpit portal, login as root with the password you provided for MapR admin above.
Now point your browser at the URL, you should see the cockpit portal, login as root with the password you provided for MapR admin above.
Once you are logged in, you will see the console as below, click on “Cluster” tab on the top. You will then further click on “Topology” on the left pane, and select the “mapr520” in the Project drop-down menu. See below graphs.
Now you should be seeing the animation of your MapR cluster being deployed on Kubernetes. The black circles are the VMs in Kubernetes cluster, the blue circles are the MapR containers that are being spun up.
Wait till the MapR deployment is finished. You should see something similar to below messages on your Kubernetes master console:
<snip> Waiting for load balancer to open up proxy port - 40 seconds, est. 5 min Waiting for load balancer to open up proxy port - 45 seconds, est. 5 min Waiting for load balancer to open up proxy port - 50 seconds, est. 5 min Waiting for load balancer to open up proxy port - 55 seconds, est. 5 min All Done!! =============================================== Please point your browser's at http://18.104.22.168:9090 for cockpit access. Please configure your browser's proxy setting to IP: 22.214.171.124 and Port: 30128 and then point your browser at https://mapr520-node0:8443 to access MCS You can also point your browser at http://mapr520node0:8047 to access Apache Drill ===============================================
To get inside these containers, click on “Containers” in the left pane and highlight your desired container (e.g. mapr-client).
Once you are in the mapr-client container, you can issue certain commands such as:
Step 4 - Configure your browser to access MapR Control System (MCS)
We have also deployed a squid proxy container that allows you to access the MapR cluster. To do this, open up your browser’s proxy setting, Firefox browser is used as an example here shown as below, fill in the HTTP proxy field with the proxy setting IP in step 3 above, then type in port 30128. Click OK. Examples are shown in the following two graphs.
Now point your browser at MCS https://mapr520-node0:8443, login as user mapr with admin password you provided in step 3. You should be able to start managing the MapR cluster from MCS portal. The cluster comes with a basic license that is sufficient to get you started, however, if you wish to explore the full features in MapR Data Platform such as MapR Database and MapR Event Store, you will have to install a free 30-day unlimited trial license. You can do so by following the section under “Apply the Trial License” in my previous blog post.
Step 5 – Deploy PACC services
Tug Grall wrote a great blog post regarding how to start using MapR PACC service. Basically it uses a sensor PACC that collects its host’s performance stats (IO, memory, cpu load, etc) and publishes them to a MapR Event Store topic, a webserver PACC then consumes these stream messages and displays them with HTTP so you can view them with a browser.
I have prepared a script in /opt/pacc on the Kubernetes master node, execute it as follows:
root@k8s-master-238E7C1E-0# cd /opt/pacc
root@k8s-master-238E7C1E-0:/opt/pacc# bash deploy_pacc
deployment "sensor-deploy" created deployment "websr-deploy" created service "mapr-pacc-svc" created Waiting for load balancer to open up cockpit port - 5 seconds, est. 5 min Waiting for load balancer to open up cockpit port - 10 seconds, est. 5 min Waiting for load balancer to open up cockpit port - 15 seconds, est. 5 min ……. Waiting for load balancer to open up cockpit port - 90 seconds, est. 5 min Waiting for load balancer to open up cockpit port - 95 seconds, est. 5 min PACC deployment Done... point your browser at http://126.96.36.199
In the browser, you can see these stats are refreshed every 3 seconds as they are published and consumed in real time.
Step 6 – Scale your PACC deployment according to demand
One very nice feature of Kubernetes is the capability to scale up and down your services dynamically according to the demands, no down time required. In your cockpit window, there are one sensor container and two web server containers got spun up with the script in previous step. You can see the web servers are spread across two VMs (black circles) for high availability purpose. Both of them are attached to a service (orange circle) to achieve balancing. The service is associated with a public internet IP address on the Azure load balancer to allow external access.
Now suppose the demand suddenly peaks and you need more web server containers to serve the load. Issue the following command on the Kubernetes master node to scale the number of web servers from 2 to 8:
root@k8s-master-238E7C1E-0:/opt/pacc# kubectl --namespace=pacc scale deployment websr-deploy --replicas=8
In a few seconds, you should see 6 more web servers popping up in the cockpit window to handle higher load. Kubernetes also makes sure that these new containers are distributed across the hosts (black circles) as evenly as possible. Note that on the other window where it is displaying the stats continues without any disruptions.
Lastly, in any case should you want to re-deploy the cluster, you can issue this command to remove the existing cluster first, e.g. kubectl delete namespace mapr520, and then do “deploy-mapr –n
We have demonstrated how to spin up a MapR cluster on the Azure Container Service Platform and how to manage PACC with Kubernetes orchestration. The benefit for this solution is a faster time-to-market software development cycle. Software developer can confidently test their codes in this environment before releasing to production.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.