How to Build a MapR "Super Sandbox" with Hadoop & Spark + Drill

Contributed by

5 min read

How to Build a MapR "Super Sandbox" with Hadoop & Spark + Drill

In this blog post, I’ll describe how to install Apache Drill on the MapR Sandbox for Hadoop, resulting in a "super" sandbox environment that essentially provides the best of both worlds—a fully-functional, single-node MapR/Hadoop/Spark deployment with Apache Drill.

MapR currently offers two separate types of Sandbox VMs:

Note that the Apache Drill Sandbox VM download is approximately half the size of the Hadoop Sandbox; this is because it does not contain some of the ecosystem components that are pre-installed on the Hadoop Sandbox, such as Apache Spark, Hue, Oozie, and Pig. The Hadoop Sandbox, on the other hand, basically has everything pre-installed except Apache Drill.

There are two things to note before getting started:

  • You should already have installed the MapR 5.1 Sandbox for Hadoop on your environment. If not, please do that first before proceeding with these instructions. Note that any virtualization environment (VMware, VirtualBox, etc.) should work fine.
  • My instructions assume that you are connected to the MapR Sandbox via ssh as user mapr (the default password for that user is mapr).

Instructions for Building a MapR Super Sandbox

  • Once you've ssh'd into your Sandbox as mapr, you'll need to change to root:
    # su -
    Password: mapr
  • Now, check to make sure that Drill hasn't already been installed: # yum list installed | grep drill
    The above command shouldn't return anything, thus indicating that Drill hasn't yet been installed.
  • Next, check the free space available using the df -h command, like so:

    # df -h
    Filesystem                         Size  Used Avail Use% Mounted on
    /dev/mapper/vg_maprdemo-lv_root    8.4G  8.0G     0 100%  /
    tmpfs                              2.9G     0  2.9G   0%  /dev/shm
    /dev/sda1                          477M   41M  411M  10%  /boot
    localhost:/mapr                    100G     0  100G   0%  /mapr
    localhost:/mapr/demo.mapr.com/user  15G  5.2G  9.8G  35%  /user
    
    

The highlighted area above shows the root volume on my MapR Sandbox. It's full, so I can't install anything else. If you see the same thing on your Sandbox, you'll need to extend the storage space of this volume. Please complete the steps listed in “How To Extend The MapR Sandbox VM's Storage Space” before proceeding with the instructions below. You'll probably want to have at least 2-3GB of available space on the root volume for a successful installation.

  • Now that the checks are done, the installation process can begin. First, stop the cluster:

    # service mapr-warden stop
    # service mapr-zookeeper stop 
    
    
  • Then, install the Apache Drill package: # yum install mapr-drill

  • Now run configure.sh to update the node configuration: # /opt/mapr/server/configure.sh -R

Note that the -R option causes the CLDB and ZooKeeper credentials to be read from mapr-clusters.conf andwarden.conf (respectively).

  • Validate consistency between hosts, cluster config, and Drill config files for cluster name and domain:

    # cat /etc/hosts
    127.0.0.1         localhost.localdomain localhost
    192.168.223.156   maprdemo
    
    
    # cat /opt/mapr/conf/mapr-clusters.conf
    demo.mapr.com secure=false maprdemo:7222
    
    
    # cat /opt/mapr/drill/drill-1.6.0/conf/drill-override.conf  << Note the version # here
    drill.exec: {
      cluster-id: "demo_mapr_com-drillbits",
      zk.connect: "maprdemo:5181"
    }
    
    
  • Re-start the cluster:

    # service mapr-warden start
    # service mapr-zookeeper start
    
    

You should now be able to access the Apache Drill Web Console using your host system's browser on port 8047:

http:/<your-sandbox-ip-addr>:8047/

For example, your URL might look something like this:

http:/192.168.223.156:8047/

In this blog post, you learned how to install Apache Drill on the MapR Sandbox for Hadoop, resulting in a “Super Sandbox” that is a fully-functional, single-node MapR/Hadoop deployment with Apache Drill.

If you’d like to take a deeper dive into Apache Drill, take a look at these resources.

If you have any questions about how to build a MapR “Super Sandbox,” please post them in the comments section below.


This blog post was published July 08, 2016.
Categories

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.


Get our latest posts in your inbox

Subscribe Now