Optimizing MapR Database Tables Search by ID

Starting from the 1904 release (MEP 6.0.2, MEP 6.1.1, and MEP 6.2.0), search by ID is supported with Hive MapR Database JSON tables.

Property of Optimization

The property name is hive.mapr.db.json.fetch.by.id.task.conversion and the value has a boolean type and by default is set to true, which means it is enabled.
To disable optimization, set hive.mapr.db.json.fetch.by.id.task.conversion to false.

Conditions for Optimization

This optimizer is designed for queries such as:
SELECT *
FROM <mapr_db_json_table>
WHERE _id = <constant_string_value>;
or:
SELECT *
FROM <mapr_db_json_table>
WHERE _id = <constant_string_value> AND (<condition_1>) AND (<condition_2>) ... AND (<condition_N>);
or:
SELECT *
FROM <mapr_db_json_table>
WHERE <Constant false operator>
where _id is a key column of MapR Database JSON table. It provides usage of the findById() method of the MapR Database JSON table. The following functionality is not supported:
  • joins
  • group by
  • distinct
  • lateral view
  • subquery
  • create table as select (CTAS) or insert
  • analyze
  • single source
The predicate is not actually a part of the filter, so it is ignored by push down:
SELECT * FROM t WHERE (CASE WHEN _id = 'value_a' THEN 2 ELSE 4 END) > 3;

Using Optimization

  1. Consider the following MapR Database JSON table:
    CREATE TABLE t(doc_id string, col1 string, col2 string) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES("maprdb.table.name" = "/user/mapr/db_json_table","maprdb.column.id" = "doc_id");
  2. Run the EXPLAIN command:
    EXPLAIN SELECT col1 FROM t WHERE doc_id='id_004';
  3. The following output is produced:
    STAGE DEPENDENCIES:
     Stage-0 is a root stage
                            
    STAGE PLANS:
     Stage: Stage-0
      MapR DB JSON Fetch By Id Operator
      limit: -1
      Processor Tree:
      TableScan
      alias: t_small
      filterExpr: (doc_id = 'id_004') (type: boolean)
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Filter Operator
       predicate: (doc_id = 'id_004') (type: boolean)
       Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
       Select Operator
       expressions: col1 (type: string)
       outputColumnNames: _col0
       Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
       ListSink
    An important part of a query plan is that it shows if optimization is available for the query:
    STAGE PLANS:
    Stage: Stage-0
    MapR DB JSON Fetch By Id Operator