Extending MapR Database Queries Using Scala Polymorphic Types

Contributed by

5 min read

When working with MapR Database, there are limitless ways to interact with it as we have explored in previous posts, such as Interacting with MapR Database, MapR Database Spark Connector with Secondary Indexes Support, and MapR Database Atomic Document Updates.

One of the most interesting and well-known paths to query MapR data tables is using the OJAI API since it is suitable for most use cases and we can issue queries in many languages, including Java, Node.js, Python, C#, Go, and others.

However, when running on the JVM, we use Scala, given all the advantages this functional language offers. Also, when using Apache Spark, we have seen that people are inclined to use Scala, and they ended up using the same language for other parts of their ecosystem.

One of the main constructs of the OJAI API is called QueryCondition. This is how we define the conditions we are going to execute against our MapR Database tables. These conditions are formed by composing all kinds of filters that MapR Database translates and executes.

The OJAI API functions around the QueryCondition use method, overriding for different data types. In other words, in order to build a query, we need to know, explicitly, the exact type we are using, and while this seems like a good idea, it is far from convenient in most occasions.

In order to deal with these issues, we have created a library. The ojai-generics project presents a very thin layer on top of OJAI that eases working with OJAI from Scala by adding idiomatic constructs.

Using ojai-generics

We can use ojai-generics to reduce the boilerplate code we are forced to write when using the OJAI Java-like API by using polymorphic and idiomatic Scala.

Let's build an example from scratch to show in more detail the advantages of using ojai-generics.

Suppose we want to create a QueryCondition for a value coming from a Spark DataFrame, normally coming as Any.

The problem here is that we need to find out the real type of value, so we can pass it the QueryCondition.is. This is a problem if we have many types and many operations. What we just did for the types above, must be done for all combinations of operations and types that our application requires. Then, repetition and code duplication start to appear everywhere, and those we should avoid at all cost.

Using ojai-generics, we can do this as follows:

We have reduced the previously shown code by removing the pattern matching on the types. That means we are using a polymorphic API that is able to accept all possible types. Also, ojai-generics takes care of the castings and conversions for us. Additionally, it adds some operators (===, =!=, <, <=, >, >=), so we can think about these comparisons in a natural way.

If we prefer a more Java-like API, we can still do the following without losing type safety or generics:

These functions are aliases that produce the same results as before, yet using a more verbose approach.

We can validate our queries by running the following tests:

Using these tests, we can verify that ojai-generics outputs the same queries as OJAI Java. This is expected since at the end of the day we are only putting some syntax and type transformations on top of the OJAI while exposing a polymorphic interface.

The code for ojai-generics can be found in this GitHub repo, or we can get the binaries directly from Maven Central in the following way:




libraryDependencies += "com.github.anicolaspp" % "ojai-scala-generics_2.11" % "1.0.0"

Important Notice

It is very important to notice that we are only adding a thin layer on top of the existing OJAI API. Everything that works there will work while using our library. We only add extended functionality. We don't modify existing functionality in any way. Our library, ojai-generics, requires that you link the corresponding OJAI dependencies since they are not packaged with it. The ojai-generics lib should be used in addition to the OJAI libraries.

This blog post was published April 17, 2019.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now