Understanding the MapR-DB OJAI Connector for Spark

Using the MapR-DB OJAI connector for Spark enables you build real-time and batch pipelines between your data and MapR-DB JSON. Before getting started, it is important that you understand Spark terminology and workflow, system requirements and support, and OJAI connector and API features.

Included is a set of APIs that enable MapR users to write applications that consume MapR-DB JSON tables and use them in Spark. The MapR-DB OJAI Connector for Apache Spark is a companion to the MapR-DB Binary Connector for Apache Spark, which provides the equivalent functionality for MapR-DB Binary tables.

Sample Batch Data Transformation Spark Workflow

You can use the MapR-DB OJAI Connector with batch data. In this diagram, data from MapR-DB or MapR-FS is extracted and transformed using either Spark or Spark SQL, and then loaded into MapR-DB JSON:

MapR-DB OJAI Connector for Apache Spark Features

Principal features of the MapR-DB OJAI Connector for Apache Spark include the following:

  • Support for Scala and, beginning with MEP 4.1, Java and Python APIs
  • APIs that enable you to load data from a MapR-DB JSON table to an Apache Spark RDD, DataFrame, or Dataset
  • Projection and filter pushdown for better performance
  • Custom partitioner for RDDs that enables you to partition data for better performance
  • APIs that save an Apache Spark RDD, DataFrame, or DStream to a MapR-DB JSON table using either normal or bulk insert
  • Support for Scala and Java bean classes
  • Data locality

The following features are not supported:

  • MapR-DB Binary tables

    Only MapR-DB JSON tables are supported; access to MapR-DB binary tables is provided through the MapR-DB Binary Connector.

  • Secondary indexes
This matrix shows the programming languages and features supported:
  Scala Java Python
RDD Yes Yes No
DataFrame Yes Yes Yes
Dataset Yes Yes No
DStream Yes No No
Note: Examples for topics include Scala, Java, and Python implementations. If any of these implementations are missing, the feature is not supported for that language.

Supported Product Versions and System Requirements

To use the MapR-DB OJAI Connector for Apache Spark, you must have the following minimum software versions:

  • MapR: 5.2.1 or later
  • MEP 3.0 or later
  • Spark 2.1.0 or later
  • Scala 2.11 or later
  • Java 8 or later
Support for DataFrames and Datasets is available starting in the MEP 4.0 release.

OJAI API

The MapR-DB OJAI Connector for Apache Spark uses the OJAI API internally to access MapR-DB JSON tables.