The MapR Docker Container for Developers

Contributed by

6 min read

MapR has released a lightweight docker container which enables developers to conveniently run a single node MapR cluster on their laptop so they can access data in the MapR platform directly from IDEs, database clients, and other software development tools. The container is free to use and includes the following components:

  • Analytical engines: Drill and Spark. You can use their respective APIs programmatically or with command line tools such as sqlline and spark-submit.
  • MapR Control System: the primary web interface for controlling a MapR cluster. You can also use the maprcli command to configure streams, tables, file system volumes, and other management tasks.
  • MapR Distributed File and Object Store (MapR XD): distributed storage for all data. You can use the hadoop fs command to operate on files and directories.
  • MapR Database: a NoSQL database for JSON and Binary data. You can interact with MapR Database using Open JSON Application Interface (OJAI) API, HBase API, and with the command line tools mapr dbshell and hbase shell.
  • MapR Event Store for Apache Kafka: distributed storage for real-time data. You can publish and consume messages to streams using the Kafka API or Spark Streaming API.

Getting Started

Rather than going into detail about how to setup the container and access everything in it, I'm just going to reference the docs then illustrate a common workflow that integrates this container in a software development environment.

The installation instructions are here.

The following two repositories describe how to load data into files, tables, and streams and process them with Drill and Spark. These tutorials were written specifically with the MapR Container for Developers in mind:

Common Workflows

One of the most common use cases for this container involves running code from an Integrated Development Environment (IDE) that accesses data and invokes analytical engines like Spark or Drill on a MapR cluster. This process will typically always include the following steps:

  1. Copy initial datasets to the container or create an application that originates datasets.
  2. Create a MapR XD volume, MapR Database table, or MapR Event Store topic.
  3. Insert datasets into said volume, table, or stream.
  4. Develop and test application code within the IDE.

Lets look at an example project that shows this workflow from start to finish. We'll use the code examples in Getting Started with MapR Database JSON which analyze data in the Yelp Open Dataset. After we download that dataset we need to copy it to the MapR XD file system on the container. Typically, files can be copied to MapR XD by copying them to an NFS mount point but NFS is not available in the MapR Developer Container so we need to use the hadoop fs -put command, like this:

sudo /opt/mapr/bin/hadoop fs -put ~/Downloads/dataset/business.json /tmp

It's easy to confuse Unix and MapR XD namespaces in the hadoop fs command, so let me clarify. The first parameter to hadoop fs -put references a directory in the standard file system on your laptop. The second parameter references a volume (i.e. directory) in the MapR XD cluster file system. To list files in MapR XD use hadoop fs -ls <dir>.

After we put business.json in MapR XD we can import that file into a MapR Database table, like this:

sudo /opt/mapr/bin/mapr importJSON -idField business_id -src /tmp/business.json -dst /apps/business -mapreduce false

That command saves a MapR Database table called business in the /apps volume. The default permissions for new tables are secure but we can make them easier to access from an IDE by granting public access, like this:

ssh root@localhost -p 2222 "maprcli table cf edit -path /apps/business -cfname default -readperm p -writeperm p"

Until now, all the commands we've mentioned can run on the docker host (i.e. your laptop's Terminal app). However, maprcli commands are not available with the MapR client installed on your laptop, which is why I'm showing it as a command issued via ssh.

Once data has been loaded into the /apps/business table and its access permissions have been set to public we can open an IDE and programmatically access that table. For example, we can run the DRILL_001_YelpSimpleQuery program with a Run configuration that looks like this in the IntelliJ IDE:


Running that example generates output similar to what's shown below:


That example ran a SQL query programmatically with the Drill API. We can see and rerun that query from the Drill web console as shown below:


The following video demonstrates the workflow described above. It shows how an IDE can be used to run and debug an application that uses MapR Database in the MapR Container for Developers.


MapR has released a Docker container which allows developers to setup a single-node MapR cluster on their laptop. This makes it easier than ever to connect a development environment to a cluster for accessing analytical engines such as Spark or Drill and the MapR core components for database, streaming, and file storage.

If you'd like to learn more about the MapR Container for Developers check out the following resources:

  1. MapR Docs: The MapR Container for Developers
  2. Getting Started with MapR Database JSON
  3. Getting Started with MapR Event Store

This blog post was published January 15, 2018.