5 min read
Today we are excited to announce the availability of Drill 1.8 on the MapR Data Platform. As part of the Apache Drill community, we continue to deliver iterative releases of Drill, providing significant feature enhancements along with enterprise readiness improvements based on feedback from a variety of customer deployments.
The current Drill 1.8 version is a production release on MapR and is another important milestone signifying Drill’s steady progress. Here are the key highlights of the release.
Starting with Drill 1.8, customers can deploy and manage Drill as a YARN application alongside other compute frameworks on the MapR cluster. This simplifies the deployment and management of Drill in customer environments involving large clusters. It is important to note that Drill in this mode works as a long running service under YARN, and doesn’t spin up YARN containers for every single Drill query given the interactive SLAs required for Drill queries. This is a different model than MR/Spark batch jobs where every job execution is launched as a YARN application.
The features of Drill/YARN integration include a new client tool to launch Drill as a YARN application, a new Drill Application Master (AM) to coordinate with the YARN resource manager to get resources for the Drill service, CPU and memory controls on the Drill service, the ability to easily and add remove nodes from the Drill cluster, and the ability to launch multiple Drill clusters in a single MapR cluster. New web console features have been introduced to help manage Drill deployments under YARN. Please refer to the documentation to learn more. Stay tuned for future blog posts where we will dive into how to configure and use this feature.
Here are a couple of screenshots of the Drill/YARN integration in the web UI.
Enhanced Query performance
Improved metadata cache performance
Metadata cache pruning for queries involving large number of partitions (Drill-4786)
Optimizations on reading the metadata cache for queries on a single partition (Drill-4530)
INFORMATION_SCHEMA query performance on Hive tables - This enhancement optimizes the calls made to the Hive metastore to retrieve metadata, thereby reducing overhead on query planning.
Monitoring via JMX & MapR Spyglass (Drill-4564)
A variety of Drill metrics are now made available via JMX to make monitoring of Drill production deployments easier. Users are able to monitor these metrics via any JMX monitoring tool such as JConsole or the Drill web console. Additionally, Drill is now integrated with MapR Monitoring. With this feature, users can capture these metrics and build custom dashboards to observe trends on a variety of system and query metrics to easily manage the health of the Drill cluster and diagnose/troubleshoot issues. Sample JMX-based Drill metrics include drill.queries.running, drill.queries.completed, heap.used, direct.used, and waiting.count. For more information on Drill monitoring, refer to the documentation here and here.
Below is a screenshot of a sample Drill MapR Monitoring dashboard.
A variety of new SQL and usability features have been introduced as part of the 1.8 release. These include:
There are many additional exciting features in Drill 1.8. Download the MapR release and try it out!
If you have any additional questions about Drill 1.8, please ask them in the comments section below.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.