Dataware for data-driven transformation

Basic Notes on Configuring Eclipse as a Hadoop Development Environment for MapR

Contributed by

8 min read

Eclipse is a popular development tool and we thought it would be helpful to share some tips on using Eclipse with MapR to write MapReduce programs. The following notes describe how to enable Eclipse as your development IDE for Hadoop MapReduce jobs using an existing MapR cluster.

The major steps are:

  • install eclipse
  • install and configure the MapR client
  • optionally install and configure an NFS clients
  • configure a few things in Eclipse
  • create a project
  • use the built in Eclipse application launcher to launch your MapReduce jobs from inside Eclipse

Note that no special Eclipse plugin is needed.

Install Eclipse
1) Download eclipse - I chose 3.7 for Java developers, but other versions should work.
2) Verify that the Java version on your machine is 1.6 with java -version.
3) Start eclipse. It will work as long as java is on your path.

Install and Configure the MapR Client

Download and install the MapR client for your platform as described in the documentation:

In brief,

1) download file.
2) unzip and/or untar it.
3) make a /opt directory.
4) move the files from the download to /opt/mapr.
5) Configure the MapR client by running

Note: there appears to be a bug recently introduced as part of the Kerberos work in Hadoop. As a result when you run the client command line tools you may see this odd error message:

2012-05-17 12:14:37.475 java[8002:1903] Unable to load realm info from SCDynamicStore

This is fixed by setting this environment variable: HADOOP_OPTS=""

Optionally Install and Configure an NFS Client

One of the great features of MapR is that the entire MapR-FS storage layer can be accessed via NFS. This means that as a developer you can easily check log files, job output, and easily put test data into MapR-FS for your jobs. To enable Direct Access NFS for your platform, read and follow the instructions from the documentation link mentioned earlier.

Configure a few Things in Eclipse
Now that the MapR client is installed and configured, there are a few things you need to do to setup Eclipse for compiling and running MapReduce code.

1) Put MapR client libraries on class path in Eclipse. To make this more reusable I prefer to create a user library in the preferences area (this screen shot is from a Mac but other platforms are similar):

Basically I included all of the jars under MAPR_INSTALL/hadoop/hadoop-VERSION/lib.

Since MapR uses native code you'll need to add the native library to the class path as well. The clean way to do this is to double click on the user library, the maprfs library, and then edit the native path:

2) You should also create a classpath variable HADOOP_CONF also in preferences:

At this point you've completed the generic configuration in Eclipse and can reuse these variables in all of your Eclipse projects. I'll create one now as an example.

Create a Project

1) Create a new project (file->new->project) and fill in the usual info.

2) On the java settings panel, click on libraries and then click add libraries, select user libraries and then the MAPR_HADOOP library you created earlier.

3) Once you've created the project, go ahead and create your first source for MapReduce. Use file->new-> java class. In the wizard specify that the class extends the base class Configured (eclipse will help you search for the package) and implements the interface Tool. When you finish it will auto generate a new class for you which should extend org.apache.hadoop.conf.Configured and implements org.apache.hadoop.util.Tool. It will also create the mandatory run() method.

At this point you can develop your MapReduce client. This is just the usual Java source editing in Eclipse. Since the MapR Hadoop libraries are on the build path, you can leverage Eclipse's built in completion functions.
Once you have made some progress, you'll likely want to test your job by running it on a cluster. This is quite straightforward in Eclipse.

Use the Built in Eclipse Application Launcher to Launch your MapReduce Jobs from Inside Eclipse

You need to do three things to make it possible to easily run a job on a cluster (or a local VM).

1) Ensure that your MapReduce code has the usual boilerplate for submitting a job. The details are describe best in the Hadoop literature. Here is a simple example of the two methods your class will need. You'll of course change these for your class:

public int run(String[] args) throws Exception {
 Configuration conf = getConf();

 JobConf job = new JobConf(conf, this.getClass());
 if (args.length < 2) {
   System.err.println("Missing required arguments <in path> <out path>");
   return 1;
 Path in = new Path(args[0]);
 Path out = new Path(args[1]);

 FileInputFormat.setInputPaths(job, in);
 FileOutputFormat.setOutputPath(job,  out);
 job.setJobName("Keys' Word Count");


 return 0;

public static void main(String[] args) throws Exception {
 int res = Configuration(), new WordCount(), args);

2) Ensure that the cluster has a user on it with the same userid AND uid as the user you are using when you run your client. If the uid isn't consistent things will not work properly when the MapReduce job runs - there will likely be strange permission errors.

3) Configure an application launcher. Here are the steps.

a) To start your job from inside Eclipse use the Java application launcher. You'll need to add the Kerberos settings from earlier, specify the launcher from Hadoop, remove the project from the class path, and add the Hadoop jars to the path. The second to last step is required because if you don't the launcher finds two copies of the MapReduce class that you developed and that prevents the running on the cluster because the JAR file isn't copied to the cluster.

b) Create a new run as object via "Run As->Run Configurations..."

Notice the main class is the Hadoop class for launching JARs.

c) Set the arguments to the command by clicking on the arguments tab and specifying the usual job inputs as well as the two JVM arguments to eliminate the Kerberos error messages describe earlier.

Notice I used the workspace_loc variable to make this more robust in case you share your project with others. Also notice the two -D entries to work around the authentication bug mentioned earlier.

d) Edit the classpath entry.

Notice that the project has been removed from the class path and that I've added in by hand the MAPR_HADOOP library created earlier (click on advanced -> add library -> add user library). I also added a reference to HADOOP_CONF which points to the Hadoop conf directory (MAPR_INSTALL/hadoop/hadoop-VERSION/conf).

e) Save your work.

f) To create a JAR file from your project you'll use the built in Eclipse export functionality to create a JAR file. I recommend you save that export description in your workspace so you can run it again easily before each test.

With those steps completed, your compile, debug cycle is

  • edit code and save. Standard Eclipse functionality works.
  • generate a JAR file via export. If you saved the generation as a description file in the workspace, you just right click on the description and select create jar.
  • launch your job using run as …

Now you should be able to quickly develop, compile, and test your code using Eclipse.

This blog post was published May 25, 2012.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now