When you index a MapR-DB table, you set up replication from that table to a type in an Elasticsearch cluster. After an initial load of data from the source table into the type, updates to the source table are replicated immediately to the type. Updates are not replicated in batches but as they happen.
Replication of data to Elasticsearch indexes is asynchronous. MapR-DB does not wait to receive confirmation that changes have been replicated before it notifies client applications that requested operations on the MapR-DB database are complete.
- Configure two or more MapR gateways to handle communications between MapR-DB and and each Elasticsearch cluster. See MapR Gateways.
- Ensure that your Elasticsearch cluster is registered with your source MapR-DB cluster. See Registering Elasticsearch Clusters with MapR Clusters.
- Run the maprcli table info command on the source table to verify that you have the following permissions:
readperm, which is required for reading from the table.
replperm, which is required for replicating from the table.
- Ensure that the
_sourcefield in Elasticsearch is enabled for all documents.
- You cannot replicate data from more than one MapR-DB table into a single Elasticsearch type.
- The replication of deletes to Elasticsearch types is not supported.
- Versioning is not supported in Elasticsearch indexes. In MapR-DB (as in HBase), tables can store an unbounded number of cells where the row and column are the same but the cell address differs only in its version dimension, the version being specified as a long integer. However, in Elasticsearch, only one version of indexed cell data is retained.
- Do not replicate puts that are made with timestamps. Because Elasticsearch retains only the most recently indexed value for a cell, an Elasticsearch type will fall out of synchronization with its corresponding source table if any puts with timestamps are made to the table out of order or replicated to the type out of order.
On the source MapR cluster, run the command
maprcli table replica elasticsearch autosetup.
This command registers the destination type as a replica of the source table, copies the content of the source table into the type, and then starts the replication stream to keep the type updated.
maprcli table replica elasticsearch autosetupcommand starts a MapReduce job. The length of the job depends on the size of the source table and the number of columns that you are indexing. Moreover, the volume of data and the speed at which the Elasticsearch type is populated could perceptibly slow the performance of other processes running at the same time on the Elasticsearch cluster. The less data there is to copy to the type, the faster the MapReduce job will end and the fewer resources the job will consume on the Elasticsearch cluster.
By default, this command causes all column families to be replicated. If you want to specify a subset of column families, individual columns, or both, use the
-columns parameter. Columns that you specify do not have to exist in the source table at the time that you run this command; you can create them later. However, column families that you specify must exist in the source table at the time that you run this command.
maprcli table replica elasticsearch autosetup -path /mapr/sanfrancisco/customers -target myescluster -index myproduct -type customers -columns personal,purchase,review:number_of_stars,review:date
This example causes the indexing of all of the columns in column families
purchase, as well as the columns
date in the column family
review, of the MapR-DB source table
customers in the MapR cluster
Client applications can now start updating the source table in MapR and querying the indexed data in ES.
What to do next
If you ever need to change the selection of columns or column families that you want to index, use the
maprcli table replica elasticsearch edit command.
To see statistics about replication from the source table, include the number of pending puts and the number of pending bytes to transfer, run the
maprcli table replica elasticsearch list command, specifying the source table with the