January 12, 2016
Works with MapR Distribution for Apache Hadoop version
Noteworthy New Features in the MapR Distribution of Drill
This release introduces a number of enhancements, including the following ones:
- Improved Tableau experience with faster Limit 0 queries (MD-602)
- Metadata (INFORMATION_SCHEMA) query performance improvements on Hive schemas/tables (MD-530, MD-504, MD-507, DRILL-4126, DRILL-4127, DRILL-4161)
- Optimized Parquet metadata caching, faster queries on a large number of files
Metadata caching is new feature introduced in Drill 1.2 to speed up Drill queries involving large number of files. In 1.4, the cache size is considerably reduced using a variety of techniques including efficient storage of information as well reducing the redundant metadata. This significantly boosts performance of the queries. Note however that the depending on the size of data, the cache construction might take longer.
- Partition pruning improvements
Partition pruning is significantly enhanced in 1.4. The enhancements including support for more datatypes, working better in conjunction with metadata caching as well as a moving the partition pruning evaluation from Hep to Volcano planner. (DRILL-4126, DRILL-3765)
- Improved Window functions resource usage and performance
- Updated 1.2.1 Windows ODBC driver (updates to Mac and Linux versions to follow)
- New and improved JDBC driver for MapR Drill
This driver is recommended by MapR for production deployments deployments and partner certifications. The new driver offers much more compatibility to the JDBC standard. This driver compliments the currently available open source version of JDBC driver.
- Stability improvements
- 90 bug fixes and additional enhancements listed in Apache Drill 1.4 Release Notes and Apache Drill 1.3 Release Notes
- Drill was upgraded to use a new Parquet reader in the Drill 1.3 dev preview release. Customers moving from Drill 1.2 and prior versions to 1.4 must migrate their data or regenerate their parquet in order to continue working with Drill in optimized way. For more information on how to upgrade the parquet, refer to the migration tool documentation.
- Merge join in 1.4 has known issues related to performance and data correctness in 1.4 (DRILL-3578). Hence, upgrading to 1.4 is not recommended for performant queries that use merge joins. If you do upgrade to 1.4, you must set the planner.enable_mergejoin option to false (that is, use hash join instead); otherwise, merge join queries will not run and/or produce incorrect results.
- Extract function with seconds returns incorrect data when you issue queries from the ODBC driver. (MD-604)
- Complex data functions FLATTEN and KVGEN return null data when you issue queries from the ODBC driver. (MD-608)
- After upgrading to Release 1.4 or later, any custom function you wrote in Release 1.3 or earlier that is in a package that Drill scans by default will run. However, if you created your own package for the custom function, it will not run in Release 1.4 and later. You can use one of the following workarounds. The first workaround is quick and easy. The second is recommended for new UDFs you create in Release 1.4 and later.
- Create a drill-module.conf file in $DRILL_HOME/conf/ if one does not already exist.
Add the following code to the drill-module.conf, replacing com.yourgroupidentifier.udf with the package name(s) of your UDFs.
- Copy the drill-module.conf file to all Drillbits.
- Ensure that DRILL_HOME/conf/drill-override.conf does not contain any information regarding UDF packages.
- Restart Drill on all the Drillbits.
Add the following code to the drill-module.conf in your UDF project (src/main/resources/drill-module.conf). Replace com.yourgroupidentifier.udf with the package name(s) of your UDFs.
Recompile the UDFs and place the 2 jar files (source + binary) in Drill's classpath on all the Drillbits.
Ensure that DRILL_HOME/conf/drill-override.conf does not contain any information regarding UDF packages.
Restart drill on all the Drillbits.
- If you upgrade from Drill 1.2 or earlier, update Parquet data.
Drill 1.4 is the first production ready release after the Drill upgrade to use the latest version of the Apache Parquet library, which was released in 1.3 dev preview. If you generated Parquet using Drill 1.2 and earlier, take one of the following actions:
- Update data using the Drill migration utility.
- Regenerate data using the new version of Drill
One of these actions is mandatory to continue to get accurate functionality and to optimize performance.
- Regardless of whether impersonation is enabled or not, this release caches data in the HiveMetaStore to improve performance. (DRILL-4126)