During a recent trip to the Asia Pacific region, I was astonished at the growing excitement already over Apache Drill; equally amazing is the fact that the adoption of Drill is skyrocketing. The groundswell of support we are seeing within the community is validated today by the Apache Foundation announcement that Drill is now a Top Level Project. You’ll find plenty of documentation on the Apache Drill project site, but let me tell you about the glowing feedback I received from customers and system integrators, and why these customers are so excited about this new tool:
- Self-service for data: Hadoop has always been a powerful platform, but until Drill, self-service data exploration has not been available to analysts. Drill is the first full ANSI standard interactive query platform. Analysts use analytic tools like Tableau, BusinessObjects, MicroStrategy and Excel to create queries, build dashboards and explore data. ANSI standard means that the SQL generated by these analytic tools will run as expected. Incomplete SQL implementations too often can’t execute the generated SQL, and once analysts have wasted time with incomplete tools, they quickly look for more complete solutions.
- Data agility: Analysts can quickly access new data sources and change date sources without having to enlist traditional DBA services to structure data. Drill only requires structure on read and not on write, like traditional platforms. Drill uses both Hive Metastore and self-describing data interchange formats like JSON, HBase tables and CSV. Schema is applied on read for all data sources. Application bindings for Hive Metastore occur at compile time. The self-describing data formats use the more dynamic and agile runtime data bindings.
- Interactive query response time and scale: The data warehouse market frequently uses interactive query response time as one of several platform selection criteria. Results vary based on the SQL query result table size. Result tables that reside in memory provide faster response times than those that spill to disk. Drill algorithms keep results sets in memory, but can also provide great response time for larger result tables that spill to disk—this is big data after all! TPC benchmarks can be used to measure the response times. You’ll find that Drill can run more of those benchmarks without modification than any of the other SQL on Hadoop projects.
- Ubiquity: Drill on Spark has been announced, and Drill on MongoDB was recently developed in the community. I’m sure we’ll see more companies integrate Drill into their platforms over the coming months.
There are many more reasons why Drill is such a valuable and innovative technology for interactive data exploration on big data. Get started today using the quick start guide and refer to the Apache Drill web site to learn more. For more examples on how to use Drill, download the Apache Drill sandbox and try out the sandbox tutorial.
Have you used Drill yet? We’d love to hear from you in the comments section below.
This blog post was published December 02, 2014.