Drill JDBC Driver Features

The Drill JDBC driver provides the following features:

Catalog and schema support
The Drill JDBC driver supports both catalogs and schemas in order to work with various JDBC applications. The Drill JDBC driver follows convention, adding a synthetic catalog named DRILL under which all of the schemas/databases are organized. The driver also maps the JDBC schema to the DRILL catalog.
Schema types and file formats
The Drill JDBC driver supports the following schema types:
  • HBase
  • Hive
  • Distributed File System (DFS), supporting the following file formats:
    • Parquet
    • JavaScript Object Notation (JSON)
    • Comma-Separated Values (CSV)
    • Tab-Separated Values (TSV)
Specifying column names
Use of the SQL asterisk (*) selector to return all columns in a table is limited. You must modify queries to specify the columns to return. Depending on the schema type or file format, refer to the following syntax guidelines and modify your queries as needed. Note that when you write queries for Drill, you must enclose all table and schema names in back ticks (``).
Note: The MapR ODBC Driver for Drill provides an application called Drill Explorer that helps generate queries for Drill. However, the Drill JDBC drill currently does not support Drill Explorer. If you would like to simplify the process of formulating valid queries for Drill, consider using the Drill Explorer and the ODBC driver.
HBase
The Drill JDBC driver presents an HBase column-family as a JDBC column. An HBase column is derived using the lookup scheme ColumnFamily['Column'] where ColumnFamily is an HBase column-family and Column is an HBase column contained in ColumnFamily. You must specify individual HBase columns, not columnfamilies, so you need to have some familiarity with the data. For example, the following query may return incorrect results:
SELECT * FROM `hbase`.`students`
When the query is modified to specify individual columns, then the query results are correct:
SELECT CAST(account['name'] AS varchar(20)) FROM `hbase`.`students`
Parquet
When a query is used as a subquery, such as when you use a SQL view to connect to a data source using a BI tool, you must specify column names in the query. For example, the BI tool may not connect to your Drill data source successfully when using the following query:
SELECT * FROM `dfs`.`default`.`./opt/drill/test.parquet`
When the query is modified to specify column names, the BI tool connects to your Drill data source successfully using the SQL view:
SELECT CAST(column1 AS varchar(20)) FROM `dfs`.`default`.`./opt/drill/test.parquet`
JSON
When a query is used as a subquery, such as when you use a SQL view to connect to a data source using a BI tool, you must specify column names in the query. For example, the BI tool may not connect to your Drill data source successfully when using the following query:
SELECT * FROM `dfs`.`default`.`./opt/drill/interval.json`
When the query is modified to specify column names, the BI tool connects to your Drill data source successfully using the SQL view:
SELECT column1 FROM `dfs`.`default`.`./opt/drill/interval.json`
To query nested elements, use the following syntax, where menu is a child of column1:
SELECT column1['menu'] FROM `dfs`.`default`.`./opt/drill/interval.json`
You can query elements that are multiple levels deep. Continuing the example, if menuitem is a child of menu, then use the following syntax:
SELECT column1['menu']['menuitem'] FROM `dfs`.`default`.`./opt/drill/interval.json`
CSV/TSV
You must specify columns using indices, which requires some familiarity with the data. For example, the following query may return incorrect results:
SELECT * FROM `dfs`.`default`.`./root/Test/Table.csv`
When the query is modified to specify columns using indices, the query results are correct:
SELECT columns[0], columns[2] FROM `dfs`.`default`.`./root/Test/Table.csv`
Hive
When using a Hive schema, you can use standard queries without modifying them.
Casting binary data
Drill can work with self-describing data such as HBase and file systems without central metadata definitions. In some scenarios, the file formats do not have defined data types for the data. For example, HBase always treats data as binary. Drill provides auxiliary functions to cast (or interpret) the data as certain data types. You must explicitly cast binary data to another format to view the data. You must also enclose all table and schema names in backticks (`). The following examples show you how to cast data as a specific data type. For more information about the SQL queries in Drill, refer to the Apache Drill documentation.

The following query displays results from an HBase database in binary format:

SELECT account['name'] FROM `hbase`.`students`

The following query displays the same results in string format:

SELECT CAST(account['name'] AS varchar(20)) FROM `hbase`.`students`

The following query displays results from a Parquet file in binary format:

SELECT column1 FROM `dfs`.`default`.`./opt/drill/test.parquet`

The following query displays the same results in string format:

SELECT CAST(column1 AS varchar(20)) FROM `dfs`.`default`.`./opt/drill/test.parquet`
Data types
The Drill JDBC driversupports many common data formats, converting between Drill, SQL, and Java data types. Drill supports the following data type mappings:
Drill Type SQL Type Java Type
BIGINT BIGINT BIGINT
BIT BOOLEAN BOOLEAN
DATE DATE DATE
FLOAT4 FLOAT REAL
FLOAT8 DOUBLE DOUBLE
INT INT INTEGER
SMALLINT SMALLINT SMALLINT
TIMESTAMP TIMESTAMP TIMESTAMP
TINYINT TINYINT TINYINT
VARCHAR VARCHAR VARCHAR