Drill operations are memory and CPU-intensive. Currently, Drill resources are managed outside of any cluster management service, such as the MapR warden service. In a multitenant or any other type of cluster, YARN-enabled or not, you configure memory and memory usage limits for Drill by modifying
drill-env.sh as described in the section, "Configuring Drill Memory" in Apache Drill documentation.
Configure a multitenant cluster to account for resources required for Drill. For example, on a MapR cluster, ensure warden accounts for resources required for Drill. Configuring
drill-env.sh allocates resources for Drill to use during query execution, while configuring the following properties in
warden-drill-bits.conf prevents warden from committing the resources to other processes.
service.heapsize.min=<some value in MB>
service.heapsize.max=<some value in MB>
service.heapsize.percent=<a whole number>
service.heapsize properties in
warden.drill-bits.conf regardless of whether you changed defaults in
drill-env.sh or not.
"Configuring Drill in a YARN-enabled MapR Cluster" shows an example of setting the
service.heapsize properties. The
service.heapsize.percent is the percentage of memory for the service bounded by minimum and maximum values. Typically, users change
service.heapsize.percent because using a percentage setting increases or decreases resources according to different node configurations. For more information about the
service.heapsize properties, see the section, "warden.<servicename>.conf."
You need to statically partition the cluster to designate which partition handles which workload. To configure resources for Drill in a MapR cluster, modify one or more of the files created by the installation process in
Configure Drill memory by modifying
warden.drill-bits.conf in YARN and non-YARN clusters. Configure other resources by modifying
warden.resourcemanager.conf in a YARN-enabled cluster.
Configuring Drill in a YARN-enabled MapR Cluster
To add Drill to a YARN-enabled cluster, change memory resources to suit your application. For example, you have 120G of available memory that you allocate to following workloads in a Yarn-enabled cluster:
File system = 20G
HBase = 20G
Yarn = 20G
OS = 8G
If Yarn does most of the work, give Drill 20G, for example, and give Yarn 60G. If you expect a heavy query load, give Drill 60G and Yarn 20G.
YARN consists of two main services:
There is at least one instance in a cluster, more if you configure high availability.
There is one instance per node.
warden.nodemanager.conf files set ResourceManager and NodeManager memory to the following defaults:
Change these settings for NodeManager and ResourceManager to reconfigure the total memory required for YARN services to run. If you want to place an upper limit on memory set the YARN_NODEMANAGER_HEAPSIZE or YARN_RESOURCEMANAGER_HEAPSIZE environment variable in
/opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/yarn-env.sh. You do not set the
-Xmx option, allowing memory to grow as needed.
MapReduce v1 Resources
The following default settings in
/opt/mapr/conf/warden.conf control MapReduce v1 memory:
Modify these settings to reconfigure MapReduce v1 resources to suit your application needs. Remaining memory is given to YARN applications.
MapReduce v2 and other Resources
You configure memory for each service by setting three values in
Configure memory for other services in the same manner. For more information about managing memory in a MapR cluster, see the following sections:
How to Manage Drill CPU Resources
Currently, you do not manage CPU resources within Drill. Use Linux cgroups to manage the CPU resources.
For more information about configuring Drill, refer to the following Apache Drill documents: