Converting Data to supported Elasticsearch Data Types

This section describes how MapR Data can be converted to Elasticsearch data types that MapR supports.

If you want to convert source data (which is stored as byte arrays) to Elasticsearch types that MapR-DB supports, you can create each destination index explicitly with Elasticsearch’s create index API and then define the mapping of data types with Elasticsearch’s put mapping API. MapR gateways perform the data conversion.

Here is a list of the data types that gateways can convert your source data into by using this method:

  • Binary
  • A base64 representation of binary data that can be stored in an index.
  • Core Elasticsearch data types
    • boolean
    • byte
    • double
    • float
    • integer
    • long
    • short
    • string
    • date
    • geolocation
    • IP addresses
Note: Gateways use java.nio.ByteBuffer to convert source data to boolean, byte, double, float, integer, long, short, and date data types. IP addresses and geolocations are passed as strings.

Restrictions

MapR-DB can convert these data types only if they meet to these requirements:
boolean
Boolean values must be represented by single bytes.
date
Timestamps must be long integers representing the time in milliseconds since the epoch.
geolocation
Geolocations must be pairs of latitude and longitude coordinates or geohash data types encoded as UTF-8 strings.
IP address
IP addresses must be UTF-8 encoded strings.

If your data cannot meet these requirements, then you must write Java routines to tell MapR-DB how to perform custom conversions.

To specify how to convert source data to the Elasticsearch data types that MapR-DB supports for indexing in Elasticsearch, follow these steps for each source table:

  1. Create the index in Elasticsearch by calling Elasticsearch’s create index API. See Index API in the Elasticsearch documentation.
  2. Call Elasticsearch’s put mapping API to register specific data-type mapping definitions for the type. When MapR-DB first puts data into the index, it calls Elasticsearch’s get mapping API to retrieve the mapping definitions. See Put Mapping in the Elasticsearch documentation.

If you have not done so already, register your Elasticsearch cluster or clusters with your MapR source cluster.

If you have already registered your Elasticsearch cluster, configure replication to types in Elasticsearch.

If you ever change how your source data is mapped to Elasticsearch data types, you must restart the MapR gateways that you are using for indexing. Follow these steps:
  1. Pause indexing of your MapR-DB source tables. To get a list of the Elasticsearch types that are used for each source table, use the maprcli table replica elasticsearch list command. For each Elasticsearch type, issue the maprcli table replica elasticsearch pause command to pause indexing.
  2. Restart the MapR gateways that you are using for indexing. See the section "On clusters where gateways are running" in Configuring Gateways for Table and Stream Replication.
  3. Resume indexing by issuing the command maprcli table replica elasticsearch resume for each Elasticsearch type that you are indexing your data in.