Drill JDBC Driver 1.5.3

New Features

The Drill JDBC 1.5.3 driver provides the following new features:

More Flexible URL Format
The URL format for Drillbit and ZooKeeper connections no longer limits the connection properties to the front of the URL.
Custom URL Format
Connection properties that are not part of the driver settings are now supported. Therefore, all connection properties entered in the connection string will be passed to the server for processing.

Known Issues

The following are known issues for the Drill JDBC 1.5.3 driver. You might encounter these issues due to limitations in the data source, the driver, or an application.

Parameterized Queries
Parameterized queries are not supported. You cannot use queries that contain parameters.
ZooKeeper Connection
On Windows, the driver returns an error when attempting to connect to a local ZooKeeper quorum. To resolve this issue, add the IP address of the ZooKeeper quorum to the Windows hosts file. The driver is then able to connect to the local ZooKeeper quorum successfully.

Features

Drill JDBC driver features released with previous driver versions include the following:

Security and Authentication
To protect data from unauthorized access, some Drill data stores require connections to be authenticated using user credentials. As of Drill 1.10, the MapR Drill JDBC driver supports the authentication protocols used in Drill by providing a mechanism that allows you to authenticate your connection using MapR-SASL, your Drill user name and password, or Kerberos.
Upgraded Functionality for Including and Excluding Specific Schemas
When the ExcludedSchemas property is set, the driver is now able to exclude schemas based on Tables metadata in addition to SchemaOnly metadata. Also, you can now configure the new IncludedSchemas property instead to specify a list of schemas to include based on SchemaOnly and Tables metadata.
Reduced Storage Footprint
A number of library files have been removed from the driver package, reducing the size of the driver package file without affecting driver functionality.
Default Schema Specification Support
You can specify a default database schema for a connection, which the driver uses for queries that do not explicitly specify a schema. To do this, set the Schema connection property to the name of the database schema.
Catalog and Schema Support
The Drill JDBC driver supports both catalogs and schemas to make it easy for the driver to work with various JDBC applications. The Drill JDBC driver follows convention, adding a synthetic catalog named DRILL under which all of the schemas and databases are organized. The driver also maps the JDBC schema to the DRILL catalog.
Schema Types and File Formats
The Drill JDBC driver supports the following schema types:
  • HBase
  • Hive
  • Distributed File System (DFS), supporting the following file formats:
    • Parquet
    • JavaScript Object Notation (JSON)
    • Comma-Separated Values (CSV)
    • Tab-Separated Values (TSV)
Specifying Column Names
Use of the SQL asterisk (*) selector to return all columns in a table is limited. You must modify queries to specify the columns to return. Depending on the schema type or file format, refer to the following syntax guidelines and modify your queries as needed. Note that when you write queries for Drill, you must enclose all table and schema names in backticks (`example`).
Note: The MapR ODBC driver for Drill provides an application called Drill Explorer that helps generate queries for Drill. However, the Drill JDBC drill currently does not support Drill Explorer. If you would like to simplify the process of formulating valid queries for Drill, consider using the Drill Explorer and the ODBC driver.
HBase
The Drill JDBC driver presents an HBase column-family as a JDBC column. An HBase column is derived using the lookup scheme ColumnFamily['Column'] where ColumnFamily is an HBase column-family and Column is an HBase column contained in ColumnFamily. You must specify individual HBase columns, not column families, so you need to have some familiarity with the data. For example, the following query may return incorrect results:
SELECT * FROM `hbase`.`students`
When the query is modified to specify individual columns, then the query results are correct:
SELECT CAST(account['name'] AS varchar(20)) FROM `hbase`.`students`
Parquet
When a query is used as a subquery, such as when you use a SQL view to connect to a data source using a BI tool, you must specify column names in the query. For example, the BI tool may not connect to your Drill data source successfully when using the following query:
SELECT * FROM `dfs`.`default`.`./opt/drill/test.parquet`
When the query is modified to specify column names, the BI tool connects to your Drill data source successfully using the SQL view:
SELECT CAST(column1 AS varchar(20)) FROM 
`dfs`.`default`.`./opt/drill/test.parquet`
JSON
When a query is used as a subquery, such as when you use a SQL view to connect to a data source using a BI tool, you must specify column names in the query. For example, the BI tool may not connect to your Drill data source successfully when using the following query:
SELECT * FROM `dfs`.`default`.`./opt/drill/interval.json`
When the query is modified to specify column names, the BI tool connects to your Drill data source successfully using the SQL view:
SELECT column1 FROM `dfs`.`default`.`./opt/drill/interval.json`
To query nested elements, use the following syntax, where menu is a child of column1:
SELECT column1['menu'] FROM `dfs`.`default`.`./opt/drill/interval.json`
You can query elements that are multiple levels deep. Continuing the example, if menuitem is a child of menu, then use the following syntax:
SELECT column1['menu']['menuitem'] FROM 
`dfs`.`default`.`./opt/drill/interval.json`
CSV/TSV
You must specify columns using indices, which requires some familiarity with the data. For example, the following query might return incorrect results:
SELECT * FROM 
`dfs`.`default`.`./root/Test/Table.csv`
When the query is modified to specify columns using indices, the query results are correct:
SELECT columns[0], columns[2] FROM 
`dfs`.`default`.`./root/Test/Table.csv`
Hive
When using a Hive schema, you can use standard queries without modifying them.
Casting Binary Data
Drill can work with self-describing data such as HBase and file systems without central metadata definitions. In some scenarios, the file formats do not have defined data types for the data. For example, HBase always treats data as binary. Drill provides auxiliary functions to cast (or interpret) the data as certain data types. You must explicitly cast binary data to another format to view the data. You must also enclose all table and schema names in backticks (`). The following examples show you how to cast data as a specific data type. For more information about the SQL queries in Drill, refer to the Apache Drill documentation.

The following query displays results from an HBase database in binary format:

SELECT account['name'] FROM `hbase`.`students`

The following query displays the same results in string format:

SELECT CAST(account['name'] AS varchar(20)) FROM 
`hbase`.`students`

The following query displays results from a Parquet file in binary format:

SELECT column1 FROM 
`dfs`.`default`.`./opt/drill/test.parquet`

The following query displays the same results in string format:

SELECT CAST(column1 AS varchar(20)) FROM 
`dfs`.`default`.`./opt/drill/test.parquet`
Data Types
The Drill JDBC driver supports many common data formats, converting between Drill, SQL, and Java data types. Drill supports the following data type mappings:
Drill Type SQL Type Java Type
BIGINT BIGINT BIGINT
BIT BOOLEAN BOOLEAN
DATE DATE DATE
DATETIME DATE DATE
FLOAT4 FLOAT REAL
FLOAT8 DOUBLE DOUBLE
INT INT INTEGER
SMALLINT SMALLINT SMALLINT
TIMESTAMP TIMESTAMP TIMESTAMP
TINYINT TINYINT TINYINT
VARCHAR VARCHAR VARCHAR