8 min read
In this tutorial, you’ll learn how you can deploy your MapR clusters with just one click on your private cloud infrastructure. In order to set this up, you will use open source software for creating clouds, and a plugin to spin up MapR clusters on demand.
OpenStack is open source software for creating private/public clouds. It provides a complete technology stack similar to the one provided by major public cloud services, including management of virtual machines, storage, and networking.
For our demonstration, we will use DevStack. DevStack is quick way to set up a development environment for OpenStack, which provides and maintains tools used for the installation of OpenStack services from source.
Sahara is the Hadoop data processing module/plugin within OpenStack. It provides a solution for users who want to deploy Hadoop clusters or run big data applications in a cloud environment.
1) First, let’s install git:
sudo apt-get install git
2) Clone repo to default user (say MapR user) directory to get initial setup scripts:
git clone https://github.com/wochanda/devstack.git -b stable/juno
3) Create the DevStack user:
4) Switch to stack user and enter user directory:
sudo su stack
5) Re-clone the repo, this time as stack user:
sudo git clone https://github.com/wochanda/devstack.git -b stable/juno
-Set environment variables, we will need to execute a CLI
source /opt/stack/devstack/openrc admin admin
6) Clone the MapR plugin repository from GitHub to a local directory in your OpenStack environment:
sudo git clone https://github.com/mapr/sahara.git /opt/stack/sahara
7) Cross verify that the MapR plugin with Sahara is added to the setup.cfg file (/opt/stack/sahara/setup.cfg):
vanilla = sahara.plugins.vanilla.plugin:VanillaProvider
hdp = sahara.plugins.hdp.ambariplugin:AmbariPlugin
cdh = sahara.plugins.cdh.plugin:CDHPluginProvider
mapr = sahara.plugins.mapr.plugin:MapRPlugin
fake = sahara.plugins.fake.plugin:FakePluginProvider
spark = sahara.plugins.spark.plugin:SparkProvider
8) Verify MapR plugin entry exists in sahara.conf file (/etc/sahara/sahara.conf):
use_floating_ips = false
plugins = vanilla,mapr,hdp
debug = True
verbose = True
9) In /opt/stack/devstack/local.conf add HOST_IP and active interface details devstack can use while spinning up VMs on cloud:
10) Add “SAHARA_REPO" and “SAHARA_ENABLED_PLUGINS” and also verify “SAHARA_BRANCH” is set correctly under Sahara configs:
11) Add below line in file “/opt/stack/devstack/lib/infra”:
echo "oslo.log" >> $REQUIREMENTS_DIR/global-requirements.txt
12) Now run the script to set up Sahara on dev/openstack:
./stack.sh ( This till take a while ~400s)
Note: If you like to see logs, or start or stop a process, join the screen session.
Now you have Horizon UI available at http://
The default user is: admin
The password is: mapr
Under Admin → Hypervisors, you see your host configuration available for DevStack to spin VMs on the cloud.
Now we have to set up different templates that we can later use to spin up MapR clusters on demand.
Step 1: Adding MapR Images to OpenStack
Note: I used a pre-built MapR Distribution image.
There are a few MapR distribution pre-built images for Ubuntu and CentOS which can be found at the following locations:
Once completed, you should be able to see the image you created in an active state.
Step 2: Creating a Flavor
In this step, you will create a different node template which you can use at a later stage for MapR deployments.
Step 3: Registering Images for the MapR Plugin
Sahara users who want to provision clusters have to specify additional properties for images that were previously added in Step 1.
Use the Image Registry to register images for use with the MapR Sahara plugin.
Enter ubuntu for Ubuntu in the User Name field.
Select and add MapR Plugin and 3.1.1 Version tags, then click the Add plugin tags button.
3. Finally, click Done to get this image registered with the Sahara Plugin.
Step 4: Creating MapR Node Group Templates
In this step, we will create node group templates. These templates describe the type of workload for a node in a cluster. For instance, I will create a control node and data node template for my two-node cluster.
Step 5: Creating MapR Cluster Templates
The last step is to create templates for MapR clusters so that users can launch clusters with just one click.
1) Define cluster templates by referencing existing node group templates, depending on the number of nodes needed in the cluster.
2) Go to Project > Data Processing > Cluster Templates > Create Template.
Select the MapR plugin name and 3.1.1 Hadoop version, then click Create.
3) Enter the template name when the Create Cluster Template box opens. On the Node Groups tab, select node group templates (click the + sign) and specify the number of nodes per group in the Count column. Select one control node and one data node, and click Create.
Finally now you can click on Launch cluster and specify cluster name to launch and kick of launching cluster as needed.
Once the cluster is launched and ready you should see the cluster status to be Active which indicates your cluster is up and ready for you to run jobs against it.
We can also create multiple cluster templates by reusing the same node templates and spinning up different clusters for various use cases.
In this tutorial, you learned how to set up MapR on a private cloud using Sahara on DevStack. Let us know if you have any feedback on the tutorial, or if you are running into any issues.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.