Apache Drill

Apache Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google’s Dremel, with the additional flexibility needed to support a broader range of query languages, data sources and data formats, including nested, self-describing data.

Drill offers the following benefits:

  • Flexibility: Drill can read from all kinds of data, including nested and schema-less. It supports querying against many different schema-less data sources including HBase, Cassandra and MongoDB. Naturally flat records are included as a special case of nested data.
  • Speed: Drill is optimized for interactive applications, and thus is designed to process petabytes of data and trillions of records in seconds.
  • Compatibility: Unlike other SQL-like interfaces to Hadoop, such as Hive, Impala, and Shark, Drill does not expose a HiveQL interface to users and applications. In order to achieve the highest level of compatibility with traditional databases, Drill exposes an ANSI-compliant SQL interface.

Why is MapR involved in the Drill Project?

MapR is a recognized as the leading Hadoop innovator and is dedicated to providing the best big data processing capabilities. MapR is committed to a highly transparent, open source project so that the best architecture can be put in place to ensure a high quality and flexible solution. This includes developing and defining open APIs to ensure a robust ecosystem. Apache Drill represents a huge leap forward for organizations looking to augment their big data processing with interactive queries across massive data sets, with a focus on schema-less and nested data which is an unmet need in the SQL-on-Hadoop market today. Driving Drill as an open source project reduces the barriers to adopting a new set of big data APIs.

How is Apache Drill different from Apache HBase™?

Drill provides a distributed execution engine for interactive queries. HBase represents a supported data source for Drill.

How is Apache Drill different from Apache Hive, Pig and Cascading?

Today these systems compile higher-level languages (e.g., HiveQL, Pig Latin) into MapReduce jobs. Once Drill is available, these systems may support Drill as an underlying low-latency execution engine, enabling interactive queries across billions of records. Chris Wensel, the author of Cascading, is collaborating with MapR on this project and is one of the initial committers.


Apache Drill Page

Apache Drill Wiki

Apache Drill Issue Tracker

Apache Drill Users Blog

Apache Drill Alpha/Milestone 1 Sourcex Download

Download Sandbox for Hadoop

GitHub - MapR

MapR Developer Central