10 min read
Apache Drill enables querying with SQL against a multitude of data sources, including JSON files, Parquet and Avro, Hive tables, RDBMS, and more. MapR has released an ODBC driver for it, and I thought it'd be neat to get it to work with OBIEE. It evidently does work for OBIEE running on Windows, but I wanted to be able to use it on my standard environment, Linux.
For more information on Apache Drill, see my previous post, Introduction to Apache Drill.
OBIEE 12c (and 11g and 10g before it) supports three primary ways of connecting to data sources:
To get OBIEE to work with Apache Drill, we'll use the third option - native ODBC drivers. I'm doing this on SampleApp v511. For a geek-out on details of the process I went through and diagnostics used to get this to work, see here.
Drill can be deployed in distributed configuration (with all the parallel processing goodness which that brings), but also run as a single instance locally. For the sake of simplicity, that's what I'm going to do here. It's rather easy to do:
# Download Apache Drill wget http://www.apache.org/dyn/closer.cgi/drill/drill-1.7.0/apache-drill-1.7.0.tar.gz # Unpack tar -xvf apache-drill-1.7.0.tar.gz # Run cd /opt/apache-drill-1.7.0/ && bin/sqlline -u jdbc:drill:zk=local
You need to make sure you've got a recent JDK available, and if you're running it on BigDataLite VM, watch out for this odd problem that I had which was related to classpaths and maniested itself with the error
All being well, you'll now have a Drill prompt:
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Aug 09, 2016 5:51:43 AM org.glassfish.jersey.server.ApplicationHandler initialize INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26... apache drill 1.7.0 "say hello to my little drill" 0: jdbc:drill:zk=local>
From here you can run a simple query to check the version:
0: jdbc:drill:zk=local> SELECT version FROM sys.version; +----------+ | version | +----------+ | 1.7.0 | +----------+ 1 row selected (0.392 seconds)
or query one of the built-in sample data sets:
0: jdbc:drill:zk=local> select count(*) from cp.`employee.json`; +---------+ | EXPR$0 | +---------+ | 1155 | +---------+ 1 row selected (0.977 seconds)
For more examples of Drill, see the tutorials.
With Drill setup and running locally, let's now install the ODBC driver. This is all on SampleApp v511 / Oracle Linux 6.7.
sudo rpm -i https://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v22.214.171.1240/MapRDrillODBC-1.2.1.x86_64.rpm
Next, create the MapR Drill ODBC driver configuration file. A sample one is provided, which you can copy from the default installation path of
/opt/mapr/drillodbc/Setup/mapr.drillodbc.ini, or you can create a new one. I put it in the default path (
[Driver] DisableAsync=0 DriverManagerEncoding=UTF-16 ErrorMessagesPath=/opt/mapr/drillodbc/ErrorMessages LogLevel=2 LogPath=/tmp/odbc.mapr SwapFilePath=/tmp ODBCInstLib=libodbcinst.so
In the above, I've changed a few things:
DriverManagerEncoding. If you leave this as the default of
UTF-32OBIEE will crash (SIGSEGV) when you try to query the data in Apache Drill. You can read all about my trials and tribulations trying to figure this out in a separate blog post coming soon.
LogLevelto 2 and
LogPathto a valid path, so that there's some log files to check if things go wrong.
Following the documentation, Configuring Database Connections Using Native ODBC Drivers:
Add the necessary environment variables to BI Server. This is done per-component in a
.properties file, which for the BI Server (OBIS / nqsserver) is
BI_DOMAIN/config/fmwconfig/bienv/obis/obis.properties, so on SampleApp /app/oracle/biee/user_projects/domains/bi/config/fmwconfig/bienv/OBIS/obis.properties. To this file (which in 12.2.1 is empty by default) we add:
Add the Drill DSN to
odbc.ini, which for OBIEE already exists and is populated with other ODBC configurations. You'll find the file in
BI_DOMAIN/config/fmwconfig/bienv/core, which on SampleApp is
[ODBC Data Sources]section
DrillDSN=MapR Drill ODBC Driver 64-bit
Add a section to the bottom of the file:
[DrillDSN] Driver=/opt/mapr/drillodbc/lib/64/libmaprdrillodbc64.so AuthenticationType=No Authentication Description=Drill ODBC Driver ConnectionType=Direct HOST=localhost PORT=31010
Now restart the BI Server:
/app/oracle/biee/user_projects/domains/bi/bitools/bin/stop.sh -i obis1 && /app/oracle/biee/user_projects/domains/bi/bitools/bin/start.sh -i obis1
On a Windows machine, I installed the MapR Drill ODBC driver, too, and created a DSN of the same name as in my
odbc.ini file above. In the Administration Tool, I set up a new database (type: ODBC Basic) and associated connection pool (ODBC 2.0) pointing to DrillDSN.
Now to try it out! In Answers, I build a Direct Database Request:
and run it
Nice. We can query data held in HDFS too, again with a Direct Database Request:
So the above DDR proves the connectivity works. But as any fool knows, DDR is at best a "tactical' solution"—at worst, a complete hack and maintenance nightmare. Let's use the force, Luke, or at least, the RPD. The first obvious thing to do is import metadata from the connection pool that we've defined. But doing this, there are no objects shown:
That's because 'tables' in Drill are not quite as clearly defined as in a standard RDBMS. A table could be a single file, multiple files matching a pattern, or even literally a table if connecting Drill to an RDBMS. So to expose a set of data through Drill, we define a view. This is where Drill Explorer comes in, as it gives a simple GUI over the available files
from where you can use the SQL tab and Create As option to create a view.
Having done this, launch the Import Metadata dialog again (right click the Connection Pool and select Import Metadata), and make sure you tick Views on the Metadata types to view. Now you'll see the object. Unfortunately, it just has a single column -
*. I've not figured out yet how—if if it's possible—to get a view to explode out all columns in the underlying select clause. Import the view:
You'll get an error about the
* column name, but the table and schema still get brought across.
Now the slightly tedious bit—define each physical column, and define the physical and logical model, done very simplistically here:
A simple query:
Aaaaaand a simple error:
State: HY000\. Code: 16001\. [nQSError: 16001] ODBC error state: S1000 code: 1040 message: [MapR][Drill] (1040) Drill failed to execute the query: select avg(T29568."stars") as c1, T29568."city" as c2, T29568."full_address" as c3, T29568."name" as c4 from "DRILL"."dfs.tmp"."yelp_business" T29568 group by T29568."city", T29568."full_address", T29568."name" order by 2, 4, 3 Query execution error. Details:[ PARSE ERROR: Encountered ". \"" at line 1, column 33. Was expecting one of: ")" ... "ORDER" ... "LIMIT" ... "OFFSET" ... "FETCH" ... "," .... (HY000)
Looking at the query being run, OBIEE is using double quotation marks (
") to quote identifiers, but Drill requires backtick (
`) instead. Heading over to DB Features can fix this:
And refreshing the report gives:
This is the very basics that are necessary to get up and running with OBIEE and Apache Drill. It would be good to see if there's an optimal, least-friction way for getting tables in Drill exposed to OBIEE without needing to enter each physical column.
One of the many powerful features of Drill is being able to access nested and array JSON values, which I've discussed in my Introduction to Apache Drill post. The above examples just use root-level attributes, and could easily be expanded out to process some of the nested fields (such as hours in the business data above). For the time being, this would be done with DDR, or a Drill view wrapped around it imported into the physical layer of the RPD.
For another method on how to do this, you can view this blog post.
This was originally published here.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.