8 min read
Apache Drill has been gaining significant user adoption and community momentum since its initial Beta availability in September 2014. The generally available version of Drill—Drill 1.0—was released in May 2015, and numerous customers have deployed and used Drill in production since then. In this blog post, I will briefly summarize some of the key capabilities that customers are finding immensely valuable in Drill. I’ll also cover common use cases where Drill is deployed, as well as resources for getting started with Drill.
Why Drill is compelling for customers
1) Drill provides SQL access on any type of data, with extreme flexibility and ease of use
With Drill, you can query data in files, a Hive data warehouse, HBase tables, or even non-Hadoop based storage systems in just a few minutes, and you can combine data from these sources on the fly. There’s no need to define and maintain any central metadata definitions. Drill queries data in-situ and discovers schema on-the-fly. Along with comprehensive SQL support offered by leveraging an advanced SQL parser (Apache Calcite), Drill also provides extensions to SQL to natively query and manipulate complex data types such as arrays and maps commonly seen in most new data sources (such as web site clicks, social, sensor data) in big data environments. Drill also comes with ODBC/JDBC drivers, so it can be plugged into BI tools such as Tableau and MicroStrategy very easily for wide usage in the organization.
2) Drill provides low latency performance at scale
Drill is a distributed and columnar SQL query engine built from the ground up for complex data. It doesn’t use MapReduce, Tez, or Spark. Drill can be deployed on a single node or can be horizontally scaled to 10s to 100s to 1000s of nodes, depending on the number of users that need to be supported, performance SLAs to be met, and the amount of data you that needs processing. Along with scale, Drill is built for performance. The in-memory columnar execution engine, designed for optimistic processing of short queries, is combined with advanced and pluggable optimizations including partition pruning, pushdown operators, and rule-based and cost-based query re-write capabilities. These capabilities make Drill a powerful interactive tool in the big data ecosystem.
3) Drill provides a granular and de-centralized security model
The views in Drill typically serve as management units to provide granular row and column-level access control on Hadoop data. Unlike other SQL technologies/tools, Drill views are de-centralized entities, and simply maintained as files on the file system (users can choose the file system location to create views as part of the query). This means that the views can be secured using file system permissions without any need to standup a separate security repository for managing permissions.
Additionally, Drill supports user impersonation, so the specific user identity can be used to access these views instead of system or process users accessing the data, which is not acceptable in several user environments. Drill also offers powerful ownership-chaining capabilities that control how many levels of nested views a given user can access, so organizations can strike a balance between self-service data exploration with controlled governance.
Use cases for Drill
At a broader level, the use case for Drill is to provide self-service BI/adhoc queries on the data stored in a Hadoop data lake/data hub. Several sub use cases exist under this umbrella, and below are some common usage patterns we see customers leveraging Drill for in their environments. Note that there is often a mix of these use cases that are used simultaneously, depending on the type of data processing and reporting requirements.
The Drill community is making rapid progress on the product with iterative releases. Soon after the core foundation was delivered in GA, a new 1.1 release was delivered in July (refer to the release notes), building on the feature set to support the above use cases along with continued improvements on SQL support, performance, scale and enterprise manageability. There are more exciting enhancements in the Drill 1.2 release for you to check out as well.
How to get started with Drill
Do you have any questions about Apache Drill? Ask them in the comments section below.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.