Dataware for data-driven transformation

Apache Drill

Schema-Less Query Execution for Self-Service BI SQL Analytics at Scale

FREE TRAINING

Apache Drill On-Demand Training

Apache Drill logo

WHAT IS APACHE DRILL?

Apache Drill is an open source distributed SQL query engine integrated into the MapR Data Platform that delivers fast and secure self-service BI SQL analytics at scale. Drill's distributed shared-nothing architecture enables incremental scale out with low-cost hardware to meet the increasing demands of query response and user concurrency.

Drill was designed from the ground up to support high-performance analysis on the semistructured and rapidly evolving data coming from modern big data applications, while still providing the familiarity and ecosystem of industry standard ANSI SQL.

Big Data SQL: Overview of Apache Drill Query Execution Capabilities | Whiteboard Walkthrough

Learn how Apache Drill achieves low latency for interactive SQL queries carried out on large datasets. With Drill, you can use familiar ANSI SQL BI tools, such as Tableau or MicroStrategy, plus do exploration directly on big data.

Explore 30+ videos to learn more about Drill.

WHY APACHE DRILL?

Drill cons icon

BASIC CHALLENGES WITH OTHER SQL SOLUTIONS

  • Other SQL solutions rely on the traditional process of manually creating schemas or metadata definitions in a centralized store. Expensive ETL (extract, transform, and load) routines need to be performed upfront in order to transform the data into a format the SQL engine can ingest. Data consumers need to wait before data can be made available to them. Data owners need to double the storage footprint in order to store data in its original format and in the ingested format.
    This means big data analytics has to slow down to wait for long IT cycles, limiting the opportunity for end users to quickly explore new datasets or make real-time decisions, effectively diminishing the power of big data analytics itself.
  • Other SQL solutions rely on data to be ingested into a managed and proprietary storage format that's very weak in handling complex data types and rapidly changing data formats. Proprietary storage formats also means there is vendor lock-in.
  • Other SQL solutions lack the ability to dynamically size compute clusters independent of storage clusters.
  • For other SQL solutions, switching to highly available is manual.
  • Other solutions lack an integrated web-based UI with which all common developer and administrator tasks can be carried out.
Drill pros icon

AGILITY, FLEXIBILITY, AND FAMILIARITY WITH APACHE DRILL

  • With Drill, users can query the raw data in situ. There's no need to load the data, create and maintain schemas, or transform the data before it can be processed. Instead, simply include the path to a Hadoop directory, MongoDB collection, or S3 bucket in the SQL query.
    Drill supports a variety of SQL and NoSQL databases and file systems, including MapR Database, MapR XD Distributed File and Object Store, HDFS, HBase, MongoDB, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS, and local files. A single query can join data from multiple datastores.
  • Drill has out-of-the-box support for delimited text, JSON, Hive formats, Parquet, and Kafka streams. Because of this, Drill avoids vendor lock-in.

    Drill features a JSON data model that enables queries on complex/nested data as well as rapidly evolving structures commonly seen in modern applications and non-relational datastores.

    Drill is the only columnar query engine that supports complex data. It features an in-memory shredded columnar representation for complex data, which allows Drill to achieve columnar speed with the flexibility of an internal JSON document model.

    Drill also provides intuitive extensions to SQL so you can easily query complex data.
  • With Drill, cluster size can be dynamic. It can be sized, increased, or decreased dynamically when launched under YARN to be independent of the underlying storage cluster's size. Drill also does not require the computer cluster size to coexist on the same nodes as the storage cluster.
  • When Drill plans and executes a query, all operations are fully distributed only to available Drill nodes in the cluster. All Drill nodes are intelligent enough to accept, plan, and perform distributed query execution. When queries are submitted, an available node in the cluster is chosen as the foreman. The foreman then factors in the number of available nodes before it creates a distributed query plan. When nodes go down, they are automatically excluded from query planning and execution. Likewise, when they come back up, no manual intervention is required to reinstate them back into the cluster. This avoids manual administrative cluster corrections for specific lost nodes.
  • Drill's built-in lightweight web UI includes the ability to:
    • Submit or cancel queries
    • View cluster status and drillbit availability
    • Edit storage plug-in definitions
    • View query plans, query execution details, and execution thread statistics
    • Export data to a delimited format
    • View/modify system parameters

KEY BENEFITS OF MAPR AND DRILL

HIGH-PERFORMANCE SCALE icon

HIGH-PERFORMANCE SCALE

High-performance scale with support for thousands of users across thousands of nodes running queries on data in the terabyte and petabyte range.

ANSI SQL COMPLIANCE icon

ANSI SQL COMPLIANCE

All the SQL analytics functionality – aggregates, filters, sorting, sub-queries (scalar and correlated), create table/view as – is available out of the box.

SCHEMA-LESS QUERY EXECUTION icon

SCHEMA-LESS QUERY EXECUTION

Discover schemas on the fly and enable immediate exploration of data stored in MapR Data Platform across a variety of data formats and sources.

ANALYTICS icon

SELF-SERVICE ANALYTICS

Self-service analytics empowers employees to make decisions with access to business and market insights.

ALWAYS-ON INSIGHTS icon

ALWAYS-ON INSIGHTS

High availability and disaster recovery out of the box, ensures your business continues to benefit from timely insights.

SECURITY icon

END-TO-END SECURITY

End-to-end security by default with industry standard authentication mechanisms (PAM, Kerberos, and MapR Security) and state-of-the-art encryption to protect sensitive data with SSL and AES 256 GCM support.

INTEGRATION icon

INTEGRATION WITH MAPR DATABASE FOR OPERATIONAL ANALYTICS

Native integration with MapR Database, including secondary indexes, achieves up to 10X query performance improvement.

IN-PLACE ANALYTICS icon

IN-PLACE ANALYTICS ACROSS HISTORICAL AND ANALYTICAL DATA

In-place analytics, instead of moving across various clusters, saves valuable time and enables faster decisions and actions.

CONNECTIVITY icon

Connectivity with popular BI tools, such as Tableau, MicroStrategy, Qlik, and many more, through ODBC and JDBC.

WHY DRILL MATTERS TO YOU

DATA ARCHITECT icon

DATA ARCHITECT

  • Enable business analysts to develop ad-hoc queries and reports on data being moved from a data warehouse as well as new data being stored in a data lake or data hub.
  • Enable self-service interactive BI at a fraction of the cost of traditional systems.
CHIEF DATA OFFICER icon

CHIEF DATA OFFICER

  • Query data in-situ without moving to an RDBMS query engine and without upfront schemas.
  • Extend the capabilities of SQL with UDFs
  • Extend user authentication with custom authenticators to meet unique security requirements of every organization.
  • Query just about any data source with custom storage and format plug-ins using well documented, self-serve resources available online at drill.apache.org.
DATA ANALYSTS icon

DATA ANALYSTS

  • Query almost any enterprise data source using ANSI SQL without needing to transport the data from where it resides or needing to bind the data to a specific schema before querying it.
  • Quickly discover insights using a lightweight web interface to run queries against any enterprise data source.
DATA SCIENTISTS icon

DATA SCIENTISTS

  • Direct integration into the MapR Data Science Refinery enables self-service data exploration and discovery, making data scientists more productive.
  • User defined functions (UDF) frameworks extend the functionality of SQL to develop applications that would have otherwise not been possible.
  • Dynamic UDFs let you develop and drop UDFs on the fly without requiring a cluster or drillbit restart.
DEVELOPERS icon

DEVELOPERS

  • Develop apps faster with the universally accepted ANSI SQL dialect of SQL using industry standard ODBC/JDBC/REST API interfaces. Use the lightweight web interface to develop, troubleshoot, performance tune, and optimize applications without requiring administrator or any other types of assistance. Enable unhindered self-service application development. Use Drill's embedded mode to develop applications on a local workstation to ensure it is optimized before promoting it to run on a cluster.
SELF-SERVICE CUSTOMERS icon

SELF-SERVICE CUSTOMERS

  • In recent years, Drill's adoption has been increasing in self-service analytics as a service use case. Because Drill was built with high scale and concurrency in mind, customers are able to dynamically provision clusters, expand or shrink them as required, and expose their data assets. In addition, customers are now able to monetize their data assets by selling analytics as a service to their own customers through industry standard BI tool interfaces like Tableau, Microstrategy, and QlikView.
  • Drill has increasingly seen adoption in analytics as a service (AaaS) solutions which offer businesses an alternative to developing internal hardware setups to perform analytics.

Apache Drill SQL Query Optimization | Whiteboard Walkthrough

Learn how Apache Drill optimization achieves interactive performance for low latency SQL queries on very large data sets when working with familiar BI tools such as Tableau, Microstrategy or Qlikview and includes techniques used for successful optimization using Drill in production.

Explore 30+ videos to learn more about Drill.

We are very excited about the new features [in MapR], Spark structured streaming allows us to use advanced analytics on real-time oil well data while Drill allows us to explore the same data using SQL. This helps us make operational decisions faster.

Eric Keister, advanced analytics and emerging technologies manager at Anadarko

CUSTOMERS USING APACHE DRILL

TransUnion logo
Sanchez