Configuring Multiple Drill Clusters and Designating One Cluster as an OJAI Distributed Query Service

As of MapR 6.0 and Drill 1.11, you can run operational queries through the OJAI Distributed Query Service, as well as analytical queries through Drill. If you want to run operational and analytical workloads in your MapR cluster, you must configure multiple Drill clusters within the MapR cluster and then configure a Drill cluster as the OJAI Distributed Query Service. Restricting each workload to its own cluster improves query performance.

Note: Installing Drill and the OJAI Distributed Query Service together through the MapR Installer is not currently supported. MapR supports only one of these services running in the cluster unless you manually install and configure multiple Drill clusters, as instructed here.

Data Distribution

If you install both Drill and the OJAI Distributed Query Service through the MapR Installer, both workloads get processed across the entire MapR cluster. When both services run together in the cluster, the system replicates data across the entire cluster, causing remote reads and impairing performance, which can lead to missed SLAs and memory issues.

Memory Allocation

The amount of memory allocated to Drill and the OJAI Distributed Query Service differ. By default, when you install Drill, 13 GB of memory is allocated to the Drillbit service running on a node:
  • 8 GB direct
  • 4 GB heap
  • 1 GB core cache
The OJAI Distributed Query Service less memory than Drill. By default, the OJAI Distributed Query Service is allocated ~ 5 GB of memory:
  • 1 GB direct
  • 3 GB heap
  • 512 MB core cache

If you use the MapR Installer and select both Drill and the OJAI Distributed Query Service, memory is configured for Drill. If you only run operational queries, which do not use as much memory as analytical queries, you unnecessarily lose an additional 8 GB of memory.

How to Run Drill and the OJAI Distributed Query Service Together in a MapR Cluster

You can manually install Drill on several nodes and divide the nodes into multiple topologies (Drill clusters). For each of the topologies, create and mount a volume. Then, create directories within each volume to store your data. Configure these directories as workspaces in the Drill dsf storage plugin. Finally, configure a Drill cluster to run as an OJAI Distributed Query Service.

The following topics provide instructions for each of the required steps: