5 min read
When working with MapR Database, there are limitless ways to interact with it as we have explored in previous posts, such as Interacting with MapR Database, MapR Database Spark Connector with Secondary Indexes Support, and MapR Database Atomic Document Updates.
One of the most interesting and well-known paths to query MapR data tables is using the OJAI API since it is suitable for most use cases and we can issue queries in many languages, including Java, Node.js, Python, C#, Go, and others.
However, when running on the JVM, we use Scala, given all the advantages this functional language offers. Also, when using Apache Spark, we have seen that people are inclined to use Scala, and they ended up using the same language for other parts of their ecosystem.
One of the main constructs of the OJAI API is called
QueryCondition. This is how we define the conditions we are going to execute against our MapR Database tables. These conditions are formed by composing all kinds of filters that MapR Database translates and executes.
The OJAI API functions around the
QueryCondition use method, overriding for different data types. In other words, in order to build a query, we need to know, explicitly, the exact type we are using, and while this seems like a good idea, it is far from convenient in most occasions.
In order to deal with these issues, we have created a library. The
ojai-generics project presents a very thin layer on top of OJAI that eases working with OJAI from Scala by adding idiomatic constructs.
We can use
ojai-generics to reduce the boilerplate code we are forced to write when using the OJAI Java-like API by using polymorphic and idiomatic Scala.
Let's build an example from scratch to show in more detail the advantages of using
Suppose we want to create a
QueryCondition for a value coming from a Spark DataFrame, normally coming as
The problem here is that we need to find out the real type of
value, so we can pass it the
QueryCondition.is. This is a problem if we have many types and many operations. What we just did for the types above, must be done for all combinations of operations and types that our application requires. Then, repetition and code duplication start to appear everywhere, and those we should avoid at all cost.
ojai-generics, we can do this as follows:
We have reduced the previously shown code by removing the pattern matching on the types. That means we are using a polymorphic API that is able to accept all possible types. Also,
ojai-generics takes care of the castings and conversions for us. Additionally, it adds some operators (
>=), so we can think about these comparisons in a natural way.
If we prefer a more Java-like API, we can still do the following without losing type safety or generics:
These functions are aliases that produce the same results as before, yet using a more verbose approach.
We can validate our queries by running the following tests:
Using these tests, we can verify that
ojai-generics outputs the same queries as OJAI Java. This is expected since at the end of the day we are only putting some syntax and type transformations on top of the OJAI while exposing a polymorphic interface.
The code for
ojai-generics can be found in this GitHub repo, or we can get the binaries directly from Maven Central in the following way:
<dependency> <groupId>com.github.anicolaspp</groupId> <artifactId>ojai-scala-generics_2.11</artifactId> <version>1.0.0</version> </dependency>
libraryDependencies += "com.github.anicolaspp" % "ojai-scala-generics_2.11" % "1.0.0"
It is very important to notice that we are only adding a thin layer on top of the existing OJAI API. Everything that works there will work while using our library. We only add extended functionality. We don't modify existing functionality in any way. Our library,
ojai-generics, requires that you link the corresponding OJAI dependencies since they are not packaged with it. The
ojai-generics lib should be used in addition to the OJAI libraries.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.