4 min read
Today, the Apache Drill community announced the release of Drill 0.9, and MapR is very excited to package this release as part of the MapR Distribution including Hadoop. With its continued project momentum and rapid enterprise and open source community adoption, the Apache Drill project is expected to be generally available (1.0 GA) in the Q2 ’15 time frame, and Drill 0.9 is the significant last iterative release paving the path for this big milestone. As always, you can read the official release notes for more details.
The Drill 0.9 release includes 195 resolved JIRAs. Some of the noteworthy improvements include:
Security: One of the key features in this release is support for security. Drill now can perform basic authentication to validate users using a configured PAM (LDAP, etc.) on the cluster. Along with authentication, the validated user identity now can be passed to the underlying file system using the Drill impersonation feature. With the combination of authentication, impersonation and Drill views , Drill provides a logical, granular and decentralized security model for organizations with a variety of security needs by combining self-service data exploration with the required level of IT governance.
Stability: Strengthening the core stability required for enterprise deployments continues to be a key focus for the project, just like in previous releases. Numerous improvements have been made with respect to memory handling, scalability, cancellation and avoiding query hangs when Drillbits are taken down or during unexpected query conditions.
Debuggability/Usability: A variety of usability enhancements have been made to the project to improve user experience. For example, Create Table As (CTAS) now show the query plan in the web UI just like Select queries. All the file readers (such as Parquet, Text) now show which file and record caused the problem in case of errors. Additionally, clear error messages are included for various unsupported features or unexpected usage of the product.
SQL improvements: Several SQL enhancements are available as part of this release. The key items include: support for a large set of values (several 1000s) in a SQL IN clause, support for queries containing scalar subqueries without correlation, support for implicit cast between numeric data types in Joins, support for queries with correlated EXISTS containing an IN subquery, execution side implementation for nested loop join operators, as well as a variety of improvements in performing advanced SQL on nested data/JSON.
Performance: Continued performance improvements mean that Drill, as an interactive tool, makes Hadoop data more broadly available to business users in organizations. Example performance improvements in the release include the ability to query partition information without reading all data, support for constant folding in filter expressions (so expressions are evaluated once in the planning phase and not repeated for every row during execution), and a variety of metadata improvements.
Data sources and data types: Improvements include better support for dictionary-encoded data in Parquet, extended JSON support (an extended list of datatypes beyond standard JSON including Date, Timestamp, Float, etc. are now added, enabling this in JSON as well as MongoDB data sources), support for reading a large number of columns, as well as the availability of an early version of an AVRO file format plugin.
ODBC/JDBC driver improvements: Enables Drill to be more compatible with a variety of BI tools such as Tableau, MicroStrategy, QlikView, and Spotfire.
To download the Apache Drill packages visitt https://package.mapr.com/releases/ecosystem-4.x/.
Learn more about Apache Drill visit https://mapr.com/products/apache-drill.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.