Apache Drill 1.8 Released on MapR Data Platform

Contributed by

5 min read

Today we are excited to announce the availability of Drill 1.8 on the MapR Data Platform. As part of the Apache Drill community, we continue to deliver iterative releases of Drill, providing significant feature enhancements along with enterprise readiness improvements based on feedback from a variety of customer deployments.

The current Drill 1.8 version is a production release on MapR and is another important milestone signifying Drill’s steady progress. Here are the key highlights of the release.

  • Integration with YARN (Available in MapR Platform only)

Starting with Drill 1.8, customers can deploy and manage Drill as a YARN application alongside other compute frameworks on the MapR cluster. This simplifies the deployment and management of Drill in customer environments involving large clusters. It is important to note that Drill in this mode works as a long running service under YARN, and doesn’t spin up YARN containers for every single Drill query given the interactive SLAs required for Drill queries. This is a different model than MR/Spark batch jobs where every job execution is launched as a YARN application.

The features of Drill/YARN integration include a new client tool to launch Drill as a YARN application, a new Drill Application Master (AM) to coordinate with the YARN resource manager to get resources for the Drill service, CPU and memory controls on the Drill service, the ability to easily and add remove nodes from the Drill cluster, and the ability to launch multiple Drill clusters in a single MapR cluster. New web console features have been introduced to help manage Drill deployments under YARN. Please refer to the documentation to learn more. Stay tuned for future blog posts where we will dive into how to configure and use this feature.

Here are a couple of screenshots of the Drill/YARN integration in the web UI.

Drill cluster status running under YARN

Drill and YARN

Drill and YARN

Management page to resize Drill cluster

Drill and YARN and Spyglass

  • Enhanced Query performance

    • Partition pruning enhancements to evaluate query filters at the leaf directory level rather than at files (Drill-4589). This will significantly help with planning performance for queries on large numbers of files.
    • Improved metadata cache performance

      • Metadata cache pruning for queries involving large number of partitions (Drill-4786)

      • Optimizations on reading the metadata cache for queries on a single partition (Drill-4530)

    • INFORMATION_SCHEMA query performance on Hive tables - This enhancement optimizes the calls made to the Hive metastore to retrieve metadata, thereby reducing overhead on query planning.

  • Monitoring via JMX & MapR Spyglass (Drill-4564)

A variety of Drill metrics are now made available via JMX to make monitoring of Drill production deployments easier. Users are able to monitor these metrics via any JMX monitoring tool such as JConsole or the Drill web console. Additionally, Drill is now integrated with MapR Monitoring. With this feature, users can capture these metrics and build custom dashboards to observe trends on a variety of system and query metrics to easily manage the health of the Drill cluster and diagnose/troubleshoot issues. Sample JMX-based Drill metrics include drill.queries.running, drill.queries.completed, heap.used, direct.used, and waiting.count. For more information on Drill monitoring, refer to the documentation here and here.

Below is a screenshot of a sample Drill MapR Monitoring dashboard.

Drill and MapR Spyglass

  • Additional enhancements

A variety of new SQL and usability features have been introduced as part of the 1.8 release. These include:

  • HBase 1.x support (Drill-4199)
  • Multibyte line delimiters for Text reader (Drill-3149)
  • Return directory associated with a workspace on the fly (Drill-4514)
  • Ability to return file names as part of queries
  • Hive CHAR data type support
  • DROP TABLE IF EXISTS SQL command support
  • Support for nested aggregate expressions for window aggregates
  • Improvements to MaxDir/MinDir functions
  • Split function
  • Access to Drill logs in the web UI
  • Addition of JDBC/ODBC client IP in Drill audit logs
  • And a lot more improvements and bug fixes

There are many additional exciting features in Drill 1.8. Download the MapR release and try it out!

How to get started with Drill:

For full documentation, please refer to http://drill.apache.org/docs. Additional resources can be found at https://mapr.com/apachedrill.

If you have any additional questions about Drill 1.8, please ask them in the comments section below.

This blog post was published September 14, 2016.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now