6 min read
MapR has released a lightweight docker container which enables developers to conveniently run a single node MapR cluster on their laptop so they can access data in the MapR platform directly from IDEs, database clients, and other software development tools. The container is free to use and includes the following components:
maprclicommand to configure streams, tables, file system volumes, and other management tasks.
hadoop fscommand to operate on files and directories.
Rather than going into detail about how to setup the container and access everything in it, I'm just going to reference the docs then illustrate a common workflow that integrates this container in a software development environment.
The installation instructions are here.
The following two repositories describe how to load data into files, tables, and streams and process them with Drill and Spark. These tutorials were written specifically with the MapR Container for Developers in mind:
One of the most common use cases for this container involves running code from an Integrated Development Environment (IDE) that accesses data and invokes analytical engines like Spark or Drill on a MapR cluster. This process will typically always include the following steps:
Lets look at an example project that shows this workflow from start to finish. We'll use the code examples in Getting Started with MapR Database JSON which analyze data in the Yelp Open Dataset. After we download that dataset we need to copy it to the MapR XD file system on the container. Typically, files can be copied to MapR XD by copying them to an NFS mount point but NFS is not available in the MapR Developer Container so we need to use the
hadoop fs -put command, like this:
sudo /opt/mapr/bin/hadoop fs -put ~/Downloads/dataset/business.json /tmp
It's easy to confuse Unix and MapR XD namespaces in the
hadoop fs command, so let me clarify. The first parameter to
hadoop fs -put references a directory in the standard file system on your laptop. The second parameter references a volume (i.e. directory) in the MapR XD cluster file system. To list files in MapR XD use
hadoop fs -ls <dir>.
After we put
business.json in MapR XD we can import that file into a MapR Database table, like this:
sudo /opt/mapr/bin/mapr importJSON -idField business_id -src /tmp/business.json -dst /apps/business -mapreduce false
That command saves a MapR Database table called
business in the
/apps volume. The default permissions for new tables are secure but we can make them easier to access from an IDE by granting
public access, like this:
ssh root@localhost -p 2222 "maprcli table cf edit -path /apps/business -cfname default -readperm p -writeperm p"
Until now, all the commands we've mentioned can run on the docker host (i.e. your laptop's Terminal app). However,
maprcli commands are not available with the MapR client installed on your laptop, which is why I'm showing it as a command issued via ssh.
Once data has been loaded into the
/apps/business table and its access permissions have been set to public we can open an IDE and programmatically access that table. For example, we can run the DRILL_001_YelpSimpleQuery program with a Run configuration that looks like this in the IntelliJ IDE:
Running that example generates output similar to what's shown below:
That example ran a SQL query programmatically with the Drill API. We can see and rerun that query from the Drill web console as shown below:
The following video demonstrates the workflow described above. It shows how an IDE can be used to run and debug an application that uses MapR Database in the MapR Container for Developers.
MapR has released a Docker container which allows developers to setup a single-node MapR cluster on their laptop. This makes it easier than ever to connect a development environment to a cluster for accessing analytical engines such as Spark or Drill and the MapR core components for database, streaming, and file storage.
If you'd like to learn more about the MapR Container for Developers check out the following resources: