How to Create Instant MapR Clusters with Docker

Contributed by

5 min read

Here at MapR, developer productivity is critical to us. In order to keep our pace of innovation high and give customers more choice and flexibility in Apache Hadoop and other open source projects we ship with the MapR Distribution for Hadoop, we apply DevOps methodologies as widely as we can. One critical piece of this is ensuring we can rapidly test our builds to ensure quality in the codebase. Automation is key here, which is what allows us to integrate all the latest innovations across multiple releases from the community in our Hadoop distribution. For example, we test and support Hadoop 2.7 with Drill 1.1 and Hive 1.0, Hadoop 2.6 with Drill 1.2 and Spark 1.3.1, and so on. For customers supporting 50 or more applications on a single MapR cluster there are many combinations possible within the MapR Distribution, which allows them to upgrade applications incrementally, saving lots of time and money.

To deliver this fast pace of innovation, we’ve been using Docker extensively. Rather than using physical servers or VMs to provision this multitude of test clusters, we build and maintain Docker images of MapR that can be provisioned on demand. This has reduced the deployment time of a test cluster from hours to seconds!

In this post, we will share the tools and methodology we use to create these Dockerized MapR clusters. We expect that you’ll find these useful as well, both to learn MapR and to test out new applications.


  • Create a multi-node MapR cluster.
  • The cluster nodes need to be accessible outside the host running the containers.
  • Launch clusters of different sizes.
  • Use real disks to achieve realistic performance.


  • Server running CentOS/RHEL 7.x with 16GB+ RAM
  • Docker 1.6.0+
  • sshpass installed
  • Free, unmounted physical disks to be attached to the MapR node containers

Network Set-up: While working towards these goals, the networking requirement was one of the critical pieces. The containers/cluster nodes need to be accessible from outside(routable). We don’t want to have a complex network setup.

Step1 : Set up a bridge interface which is routable. (Eg : br0) Ref

Here is a config example on CentOS 7.0 server:

# cat /etc/sysconfig/network-scripts/ifcfg-br0 

# cat /etc/sysconfig/network-scripts/ifcfg-enp4s0 

Step 2 : Get a free range of routable IP addresses from the network admin to be used for the containers in the same vlan as the bridge IP address.
Eg: We got - This gives IPs to (for containers)

Docker configuration:
Configure docker with the following options:

    **-b=bridge-inf --fixed-cidr=x.x.x.x/mask**
    Eg:  -b=br0 --fixed-cidr= 
    This gives the containers the routable IP addresses in the abovementioned range.

Disks for the Containers:
Each container requires one disk drive or partition to be used for MapR.
Generate a list of disks and put one per each line in a text file.

Eg : # cat /tmp/disklist.txt 

If there are a greater number of disks in the text file than the containers requested, the remaining disks are added to the first container.

Download and run the script: from here 4.0.2, 4.1.0, 5.0.0

 **Usage : ./ <font style="blue">ClusterName</font> NumberOfNodes <font style="blue">MemSize-in-kB</font> Path-to-DisklistFile**
# ./  demo 4 16384000 /tmp/disklist.txt 
Control Node IP :        Starting the cluster:    login:mapr   password:mapr
Data Nodes :,,

Launch MapR management console with control node IP: (from the output of the above example)

In this blog post, you’ve learned how to create instant MapR clusters with Docker. If you have any further questions, please ask them in the comments section below.

Are you interested in reading more about working with Docker and MapR? Read the blog post My Experience with Running Docker Containers on Mesos.

This blog post was published September 01, 2015.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now