Understanding the MapR-DB OJAI Connector for Spark

Using the MapR-DB OJAI connector for Spark enables you build real-time and batch pipelines between your data and MapR-DB JSON. Before getting started, it is important that you understand Spark terminology and workflow, system requirements and support, and OJAI connector and API features.

The MapR-DB OJAI connector includes a set of APIs that enable you to write applications that consume MapR-DB JSON tables and use them in Spark. The MapR-DB OJAI Connector for Apache Spark is a companion to the MapR-DB Binary Connector for Apache Spark, which provides the equivalent functionality for MapR-DB Binary tables.

MapR-DB OJAI Connector with Spark Workflow

You can use the MapR-DB OJAI Connector to extract data from MapR-DB or MapR-FS and transform that data using either Spark or Spark SQL, and then load it into MapR-DB JSON:

MapR-DB OJAI Connector for Apache Spark Features

Principal features of the MapR-DB OJAI Connector for Apache Spark include the following:

  • Support for Scala and, beginning with MEP 4.1, Java and Python APIs
    This matrix shows the programming languages and features supported:
      Scala Java Python
    RDD Yes Yes No
    DataFrame Yes Yes Yes
    Dataset Yes Yes No
    DStream Yes No No
  • APIs that enable you to load data from a MapR-DB JSON table to an Apache Spark RDD, DataFrame, or Dataset
  • Projection and filter pushdown for better performance
  • Custom partitioner for RDDs that enables you to partition data for better performance
  • APIs that save an Apache Spark RDD, DataFrame, or DStream to a MapR-DB JSON table using either normal or bulk insert
  • Support for Scala and Java bean classes
  • Support for data locality

The following features are not supported:

  • MapR-DB Binary tables

    Only MapR-DB JSON tables are supported; access to MapR-DB binary tables is provided through the MapR-DB Binary Connector.

  • Secondary indexes

Supported Product Versions and System Requirements

To use the MapR-DB OJAI Connector for Apache Spark, you must have the following minimum software versions:

  • MapR: 5.2.1 or later
  • MEP 3.0 or later
  • Spark 2.1.0 or later
  • Scala 2.11 or later
  • Java 8 or later
Support for DataFrames and Datasets is available starting in the MEP 4.0 release.

OJAI API

The MapR-DB OJAI Connector for Apache Spark uses the OJAI API internally to access MapR-DB JSON tables.