MapR 5.0 Documentation : Gateways for Indexing MapR-DB Data in Elasticsearch

When you index MapR-DB tables in Elasticsearch, MapR-DB replicates table updates to corresponding Elasticsearch types. The MapR-DB tables are in MapR clusters, and the types are in Elasticsearch clusters. Gateways receive table updates and pass them to nodes in Elasticsearch clusters.

You can place gateways on existing nodes in your source MapR cluster, on existing nodes in your Elasticsearch cluster, or on nodes that are not part of either cluster -- wherever you find that the network performance from source MapR cluster to gateway and from gateway to Elasticsearch cluster is best.

When gateways are on nodes that are part of an Elasticsearch cluster, the gateways are invisible to Elasticsearch. All management of gateways is done from the source MapR cluster.

Wherever you place MapR gateways that you use for indexing, they become part of the source MapR-DB cluster for the following two reasons:

  • After you install the mapr-gateway package on a node, you run the configure.sh script, supplying the name of the MapR cluster that the gateway belongs to as the value for the -N parameter.
  • When you tell your source MapR cluster where the gateways are, you list the gateway nodes together with the name of the MapR cluster that they belong to.

As a result, you can manage gateways from the maprcli when you are logged into the source MapR cluster. Also, gateways are able to access information on the source MapR cluster about source tables and their corresponding Elasticsearch types.

Examples of Gateway Placement for Indexing in Elasticsearch

The following diagrams illustrate the different choices there are in where to place gateways. In each diagram there are two clusters: a source MapR cluster where the tables being indexed are located, and the Elasticsearch cluster where the corresponding types are located. Each cluster consists of 9 nodes, the MapR cluster with nine MapR nodes (in orange) running the MapR filesystem and storing table data, and the Elasticsearch cluster with nine nodes (in yellow) storing shards of the index where the types are located. Nodes where gateways are running are depicted in blue.

Gateways on the Source MapR Cluster

In the first diagram, the gateways are installed on three of the nodes in the MapR cluster.

Gateways on Independent Nodes That Are Added to the Source MapR Cluster

In the next diagram, the gateways are installed on servers that were not previously part of the MapR cluster. Only the gateway services are installed on these new nodes. However, these nodes logically become part of the MapR cluster, as indicated by the dotted line extending from this cluster. Because gateways consume CPU and network resources, placing gateways on dedicated nodes allows for higher throughput rates than the previous configuration.

Gateways on the Elasticsearch Cluster

In the final diagram, the gateways are installed on nodes that are part of the Elasticsearch cluster. As in the previous diagram, these nodes logically become part of the MapR cluster, as again indicated by the dotted line extending from the MapR cluster. The gateways are managed from that cluster only. No management of the gateways needs to take place through Elasticsearch.

Types of Elasticsearch Clients for Gateways to Use

After deciding where to place your gateways, decide which type of Elasticsearch client for the gateways to use to connect to the Elasticsearch cluster. There are two types of Elasticsearch client for this purpose: transport clients and node clients.

All of the gateways that you use to index data that is in a source MapR cluster must use the same type of client. For example, if you are using four gateways, you cannot use two with transport ciients and two with node clients. All four must use either transport clients or node clients.

Transport clients

When gateways use transport clients to connect to an Elasticsearch cluster, the gateways connect to one or more transport nodes in that cluster. The gateways pass replicated updates from the source MapR cluster to the transport nodes. These nodes are responsible for distributing the updates to the correct nodes in the Elasticsearch cluster.

Node clients

When gateways use node clients to connect to an Elasticsearch cluster, the gateways themselves distribute updates to the correct nodes in the Elasticsearch cluster.

For more information about these clients and for instructions about how to use them, see the Elasticsearch documentation at https://www.elastic.co/guide/index.html.