6 min read
We just released yet another version of Drill, our wildly popular, open source, SQL-on-anything query engine. Apache Drill 1.14 is now available in MapR 6.1 as part of MEP 6.0 (MapR Expansion Pack). As we continue to execute on the core themes for Drill, you'll see improved query performance and resource utilization with the following new features and improvements:
An improved Web UI
Ability to provide storage plug-in definitions at boot time
LATERAL .. UNNEST is an ANSI SQL construct that efficiently performs operations on nested data.
UNNEST is a table function applied on arrays. UNNEST acts like a SQL for each loop because the function iterates over each array element, applying any functions, limits, or aggregations on the array elements before associating them with the parent row through LATERAL.
Consider the following example:
Using LATERAL … UNNEST improves performance by an order of magnitude. Intelligently pushing down aggregate, filter, and limit conditions onto child elements of a complex type reduces the need to explode the array elements into separate records.
Starting in MapR v6.1, secondary indexes can be created on complex types, like ARRAYs and MAPs, stored in MapR Database. Index keys can be defined on entire ARRAYs themselves or on elements of arrays, even if they are MAPs. Drill can leverage these indexes to get low-latency document lookups when querying these elements.
Consider the following example:
How good is this, anyway? Testing shows significant performance improvements, as well as a significantly reduced memory footprint since fewer rows need to be processed in memory.
Drill processes data in-memory to provide high performance; however, memory is a limited resource that must be made available to all tenants in a cluster. If you run Drill in a multi-tenant cluster, you can configure Drill to use a fixed amount of memory such that any additional memory allocation requests from operations spill to disk. Spilling data to disk prevents Drill queries from failing with OOM errors when running multiple queries in a crowded cluster.
Starting in Drill 1.14, the Hash Join operator can spill to disk. Previously, only the Sort and Hash Aggregate operators spilled data to disk. The Hash Join operator is one of the most frequently seen operators in any query with a join. By adding spill to disk support for the Hash Join operator, you have more fine-grained control over the memory for in-memory processing and spill to disk.
You can run all queries in memory, limiting the level of concurrency in memory-constrained clusters, or you can configure the amount of memory spilling operators can use before they spill data to disk, which frees up system memory to support higher concurrency.
Batch sizing – introduced in Drill 1.13 (see Drill 1.13 blog) – specifies the size of the batches that Drill processes internally. Increasing the batch size allocates more memory up front, which reduces operator back and forth and increases overall performance, but at a higher memory utilization. Decreasing the batch size reduces the memory utilization of a query, but at a slightly higher compute cost. With dynamic batch sizing, you have more control over how much memory queries use. In this release, the Hash Join, Hash Aggregate, Project, and Union operators can be configured for a variable batch size.
The Parquet Scan operator has been enhanced to:
In these examples, Drill pushes down the predicates from one side of the join to the other side of the join. This now works regardless of how many joins there are in the query. Performance improvements observed for certain queries are shown in the tables below:
Storage plug-in handling improvements (DRILL-4580)
SP definitions now read through a file provided during Drillbit startup
Ex: SELECT col1, ANY_VALUE(col2) from <table> group by col1
picks 1 random value of col2 for every group of col1
And you'll find numerous more improvements. Download the latest version of Drill from our website.
The MapR Drill Product Team
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.