In this tutorial, you'll get familiar with the MapR Control System dashboard, learn how to get data into the cluster (and organized), and run some MapReduce jobs on Hadoop. You can read the following sections in order or browse them as you explore on your own:
Once you feel comfortable working with the MapR Virtual Machine, you can move on to more advanced topics:
- Working with Snapshots, Mirrors, and Schedules
- Getting Started with Hive
- Getting Started with Pig
- Getting Started with HBase
The dashboard, the main screen in the MapR Control System, shows the health of the cluster at a glance. To get to the dashboard, click the MapR Control System link on the desktop of the MapR Virtual Machine and log on with the username mapr and the password mapr. If it is your first time using the MapR Control System, you will need to accept the terms of the license agreement to proceed.
Parts of the dashboard:
- To the left, the navigation pane lets you navigate to other views that display more detailed information about nodes in the cluster, volumes in the MapR Storage Services layer, NFS settings, Alarms Views, and System Settings Views.
- In the center, the main dashboard view displays the nodes in a "heat map" that uses color to indicate node health--since there is only one node in the MapR Virtual Machine cluster, there is a single green square.
- To the right, information about cluster usage is displayed.
- Try clicking the Health button at the top of the heat map. You will see different kinds of information that can be displayed in the heat map.
- Try clicking the green square representing the node. You will see more detailed information about the status of the node.
The browser is pre-configured with the following bookmarks, which you will find useful as you gain experience with Hadoop, MapReduce, and the MapR Control System:
- MapR Control System
- JobTracker Status
- TaskTracker Status
- HBase Master
- CLDB Status
Don't worry if you aren't sure what those are yet.
With MapR, you can mount the cluster via NFS, and browse it as if it were a filesystem. First, make sure the cluster is mounted via NFS:
- Click the terminal icon at the top of the screen to open the termina.
showmountin the terminal to see what hosts are mounted on mapr-desktop. Example:
- If no hosts are listed, use the
mountcommand to mount the cluster. Example:
showmountagain to verify that the cluster is successfully mounted.
With the cluster mounted via NFS, try double-clicking the MapR NFS icon on the MapR Virtual Machine desktop.
When you navigate to mapr > my.cluster.com you can see the volume user that is preconfigured in the VM.
Try copying some files to the volume; a good place to start is the files
sample-table.txt which are attached to this page. Both are text files, which will be useful when running the Word Count example later.
- To download them, select Attachments from the Tools menu to the top right of this document (the one you are reading now) and then click the links for those two files.
- Once they are downloaded, you can add them to the cluster.
- Since you'll be using them as input to MapReduce jobs in a few minutes, create a directory with the same name as the user, which is mapr, and another directory called in under the mapr directory in the volume
userand drag the files there.
By the way, if you want to verify that you are really copying the files into the Hadoop cluster, you can open a terminal on the MapR Virtual Machine (select Applications > Accessories > Terminal) and type
hadoop fs -ls /user/mapr/in to see that the files are there.
When you run MapReduce jobs, and when you use Hive, Pig, or HBase, you'll be working with the Linux terminal. Open a terminal window by selecting Applications > Accessories > Terminal.
Running a MapReduce Job
In this section, you will run the well-known Word Count MapReduce example. You'll need one or more text files (like the ones you copied to the cluster in the previous section). The Word Count program reads files from an input directory, counts the words, and writes the results of the job to files in an output directory. For this exercise you will use
/user/mapr/in for the input, and
/user/mapr/out for the output. The input directory must exist and must contain the input files before running the job; the output directory must not exist, as the Word Count example creates it.
- On the MapR Virtual Machine, open a terminal (select Applications > Accessories > Terminal)
- Copy a couple of text files into the cluster. If you are not sure how, see the previous section. Create the directory
/user/mapr/inand put the files there.
- Type the following line to run the Word Count job:
- Look in the newly-created
/user/mapr/outfor a file called
part-r-00000containing the results.
That's it! If you're ready, you can try working with MapR tables.