Using Serialization with the MapR-DB OJAI Connector for Apache Spark

In the context of the MapR-DB OJAI Connector for Apache Spark, serialization refers to the methods that read and write objects into bytes. The Spark cluster framework requires serialization to exchange objects between driver and cluster executors. This type of serialization has nothing to do with the way MapR-DB serializes the objects onto the disk.

Classes used in Spark transformations or actions must be serializable. Otherwise, an exception is reported to the user. All classes should be Serializable. Therefore, classes created for the MapR-DB OJAI Connector for Apache Spark are serializable.

Spark uses Java serialization by default, but Spark provides a way to use Kryo Serialization as an option. A user can register serializer classes for a particular class. Kryo Serialization provides better performance than Java serialization.

An OJAI document can have complex and primitive value types. Java can serialize primitive value types, but for complex types -- Map, Array, and Binary, for example -- new wrappers have been created, and these wrappers are serializable. See “Working with Complex JSON Document Types.”

Time-related datatypes, such as ODate, OInterval, OTime, and OTimeStamp, are serializable by default (they extend the Serializable interface). For efficiency, new serializers and comparators have been created for these datatypes. And a new kryo registrator is introduced so that the user can avoid using the default Java serialization.

Here is the list of new serializers:

  • ODateSerializer - Serializer for ODate type
  • OTimeSerializer - Serializer for OTime type
  • OTimeStampSerializer - Serializer for OTimeStamp
  • OIntervalSerializer - Serializer for OInternval
  • DBBinaryValueSerializer - Used to serialize bytebuffer

The following example shows how to set the new kryo registrator in sparkconf:

new sparkconf()
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrator", "com.mapr.db.spark.OJAIKryoRegistrator")