Spinning Up a Hadoop Cluster in the Cloud

Contributed by

6 min read

I often get asked, “What is the easiest way to get hands-on experience with MapR?” The best way is to try the MapR Sandbox, a single-node MapR cluster that you can run on your laptop. However, Hadoop clusters are never built with just one server, and some MapR features require multiple nodes, or even multiple clusters. To get hands-on with a MapR installation that more closely resembles what you might deploy on hardware, I suggest you deploy a MapR cluster in the Amazon cloud, using the MapR Installer. This blog post will walk you through that process.

Make sure that you have an account set up with Amazon Web Services (AWS), with billing enabled so that you can provision a virtual infrastructure.

Step 1 - Create Amazon EC2 Instances

From the AWS EC2 Console, choose ‘Instances’ from the left bar, and click the ‘Launch Instance’ button. This will load a new page where you can describe the instance you want to deploy. Suggestions on how to proceed are below:

  1. Choose AMI
    • Here, you can choose Red Hat, Ubuntu, SUSE, or CentOS.
    • I like to deploy with Centos 6, which can be found by clicking ‘AWS Marketplace’ on the left side, searching for ‘centos 6’, and selecting ‘CentOS 6 (x86_64)’ - with Updates.
  2. Choose Instance Type
    • MapR runs on instances with 8GB of RAM or higher. Since you are dealing with big data, you will want to pick an image with decent storage attached.
    • My suggestion for a basic proof of concept cluster is m1.xlarge.
  3. Configure Instance
    • All you need to do here is specify the number of instances you want to deploy. For a basic proof of concept, three should do.
  4. Add Storage
    • If you chose an image with local storage, you need to add it here. For m1.xlarge, click ‘Add New Volume’, and choose ‘Instance Store 0’ in the leftmost dropdown.
    • Repeat for Instance Store 1 through 3.
  5. Tag Instance
    • Give your instances a name, such as ‘mapr-cluster.’
  6. Configure Security Group

    • To get up and running quickly, you can open the firewall by finding the dropdown that says ‘SSH’ and change it to ‘All TCP.’
    • Note - this is not secure. For production installations you should customize your firewall rules for specific MapR services.
  7. Click ‘Review and Launch’, and then ‘Launch.’

Next from the ‘Launch Status’ page you’ll want to click ‘View Instances’ to look at the instances you just created, and find out what their IP addresses are. Each instance has a public IP (which is used to access it from outside Amazon) and a private IP (which the instances use to talk to each other). To determine both IP addresses, you need to modify the view on the instances page by clicking the gear icon, and checking the box next to ‘Private IP Addr.’ Make note of both sets of IP addresses that are displayed, as you’ll need them later.

Step 2 - Install MapR
Now we’ll use our private key to log into this Amazon instance. Before we start the installer, we will also check to see what the paths are to the local disks.

$ ssh -i WillAWSKey.pem root@

[root@ip-172-31-40-218 ~]# fdisk -l | grep dev
Disk /dev/xvde: 8589 MB, 8589934592 bytes
/dev/xvde1   *           1        1045     8387584   83  Linux
Disk /dev/xvdf: 450.9 GB, 450934865920 bytes
Disk /dev/xvdg: 450.9 GB, 450934865920 bytes
Disk /dev/xvdh: 450.9 GB, 450934865920 bytes
Disk /dev/xvdi: 450.9 GB, 450934865920 bytes

Above we see that our local disks are available at /dev/xvdf, /dev/xvdg, /dev/xvdh, and /dev/xvdi. Later we will give this information to the installer. Now we’re ready to download and start the MapR Installer.

[root@ip-172-31-40-218 ~]# sudo yum install -y wget
[root@ip-172-31-40-218 ~]# wget https://package.mapr.com/releases/installer/mapr-setup.sh -P /tmp
[root@ip-172-31-40-218 ~]# sudo bash /tmp/mapr-setup.sh

At this point you will enter into an interactive prompt where you will be asked a few questions about your installation preferences. Simply hit enter to accept the defaults for most questions, other than what you want the password for the mapr administrative user to be. After the installer packages are downloaded an installed, the installer will print a link that you can put into your browser to enter the GUI portion of installation.

The GUI portion of installation is pretty self explanatory, if you don’t understand any fields just mouse over the tooltips. This video has a full walkthrough on the installation process. Some tips -

  1. Login with the username and password you just specified in the interactive prompts. You’ll need this password again at the ‘cluster’ phase.
  2. The installer will automate the application of trial licenses, so have your MapR.com credentials handy.
  3. At the ‘Nodes’ page, make sure to specify the internal IP addresses for your EC2 instances(you looked them up in Step 1), the disks you discovered at the beginning of Step 2, and the private key you used to create your EC2 instances.

Step 3 - Login to Your Cluster
Once installation is complete, the installer will provide the IP address of MapR Control System (MCS) of your new cluster, as well as links and IP addresses for any other service endpoints that you installed.

Enjoy your MapR cluster! If you’re wondering what to do next, you may be interested in our tutorials.

This blog post was published December 05, 2014.