Apache Drill Achieves 1st Milestone Release

November 08, 2013 | BY Ted Dunning

The open source incubator project Apache Drill has just made its first release, a significant milestone on the road to graduating to a top-level Apache Software Foundation project. This is a big step that represents a lot of work by the Drill engineering contributors who built the software, by core Drill committers, and by the Apache Drill community who participated in the code review and voting process for the release. It isn’t often recognized, but much of the value that the Apache Software Foundation provides is a standard of due diligence in release packaging and dependency license checking. Maintaining this high standard is a crucial factor in making it possible for businesses to adopt open source software.

Getting a brand new project to the point of making Apache-level releases is a non-trivial step and is a critical step in the Apache incubation. Apache Drill has taken this step and has now made the Drill Alpha release available.

What is the status of Apache Drill for users?

Apache Drill is in very active development, with contributions being made by a wide community of participants. With this first release, Drill is at a stage where early users can try it out. While Drill is not ready for production use, it is ready for early feedback from potential users. It would be especially helpful to the project and the community for users to try the software now and provide constructive feedback, either informally via the Apache Drill user mailing list or in a more involved way, such as defining a project of query testing and working with Drill engineers to report and examine results of such queries.

Users choose from a variety of data sources, pose a query, and set Drill into action.

What is the status of Apache Drill’s underlying technology?

Drill has two major goals. The primary one is to support users who need advanced ad hoc querying capability. The second is to develop technology that can be used in Drill itself, or in other projects. At this point, Drill has already demonstrated the following technical capabilities

  • Initial functionality of a dynamically typed SQL parser
  • Distributed query via the SQL, logical plan and physical plan layers
  • High performance networked processing of in-memory columnar data
  • Code generation for SQL execution from Java-based template definitions
  • Execution of queries with nested data having dynamic schemas

Several of these capabilities are unique to Drill. If you are curious how these work, join the mailing list or the weekly Google hangout.

What are the next steps for Apache Drill?

The Drill team (break out the pom-poms!) is working hard at building community and building code. The next few milestone releases should lead to code that is suitable for early production use. In particular, over the next few months you should expect to see the following:

  • Additional milestone releases
  • New committers
  • Full operation of the dynamically typed SQL parser
  • Integration of Optiq’s cost-based optimizer
  • Initial query performance and throughput results

Technically speaking, this first milestone breaks new ground relative to the handling of self-describing data. Coming soon will be the ability to work with a combination of data under centralized schema management such as HCatalog. Both styles of meta-data management are important. With centralized management you can have database administrators who carefully curate tabular schemas. This is good if you have a heavily linked table structure inherited from a data warehouse, for instance. On the other hand, self-describing data allows a much more fluid style of data management. With no central meta-data store, performance can actually be much higher than with centralized management for very parallel workloads or very many tables. Self-describing data often also goes hand in hand with nested data, somewhat like the document-oriented data as supported by MongoDB. Drill already handles self-describing data with limited forms of nesting. Upcoming releases will extend this to general forms of nesting as well as full support for centralized meta-data.

How can others join in?

You can join the development of Apache Drill in many different ways. If you are good at coding, we need you. If you are good at documenting, we want you. If you have other special talents, speak up! In all of these cases, one way to get started is to join the mailing list and speak up. Another way is to attend the meetups and to hear about what is happening with Drill and share ideas. The Bay Area Apache Drill User Group recently met in San Jose to discuss the status of the project at the first milestone release. Details are found on the meetup site. Join the Drill User Group:
http://www.meetup.com/Bay-Area-Apache-Drill-User-Group/ Check out Apache Drill project at:
http://incubator.apache.org/drill/ Get involved:
http://incubator.apache.org/drill/index.html#get_involved Follow on Twitter:
Follow @ApacheDrill Find additional discussions and updates at Apache Drill User site:
http://drill-user.org/ About Ted: Ted Dunning is MapR's Chief Application Architect and Apache Drill Project Champion.