Understanding the MERGE Statement

You can use the MERGE statement to efficiently perform record-level INSERT and UPDATE operations within Hive tables.

The MERGE statement can be a key tool of MapR cluster data management. It is based on ANSI-standard SQL.

Simple merge.maprdb.column.id is the join key

Consider merging the following example source and target tables:

Table 1. customer_source
id first_name last_name age
001 Dorothi Hogward 7777
002 Alex Bowee 7777
088 Robert Dowson 25
Table 2. customer_db_json_target
id first_name last_name age
001 John Smith 45
002 Michael Watson 27
003 Den Brown 33
You can use the following SQL-standard MERGE statement:
MERGE into customer_db_json_target trg 
USING customer_source src 
ON src.id = trg.id 
WHEN MATCHED THEN UPDATE SET age = src.age 
WHEN NOT MATCHED THEN 
INSERT VALUES (src.id, src.first_name, src.last_name, src.age);
The result is:
id first_name last_name age
001 John Smith 7777
002 Michael Watson 7777
003 Den Brown 33
088 Robert Dowson 25
Note: The age column is updated and a new id column is inserted.

Simple merge.maprdb.column.id is not the join key

Merging when merge.maprdb.column is not the join key is not recommended.

Deleting while merging

Deletions are not supported in a MERGE statement. This example:
MERGE INTO  customer_db_json old 
USING customer_new new ON new.id = old.id 
WHEN MATCHED AND old.age > 10 THEN DELETE;
Will raise the following exception:
Error: Error while compiling statement: FAILED: SemanticException Deletes are not supported for MapR DB JSON tables (state=42000,code=40000)

Multiple source rows match a given target row (cardinality violation)

Consider merging the two tables customer_db_json and customer_new:

Table 3. customer_db_json
id first_name last_name age
001 John Smith 45
002 Michael Watson 27
003 Den Brown 33

And:

Table 4. customer_new
id first_name last_name age
001 Dorothi Hogward 77
001 Dorothi Hogward 77
088 Robert Dowson 25
To MERGE customer_new and customer_db_json:
MERGE INTO customer_db_json trg 
USING customer_new src ON src.id = trg.id 
WHEN MATCHED THEN UPDATE 
SET first_name = src.first_name, 
last_name = src.last_name 
WHEN NOT MATCHED THEN INSERT VALUES
(src.id, src.first_name, src.last_name, src.age);

This example causes an exception because of duplicate values in the id column in the customer_new table:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating cardinality_violation(_col0)

To avoid cardinality violation, set hive.merge.cardinality.check=false, but in this case the result is unpredictable because there is no rule which defines the order of duplicated data that will be inserted while using the MERGE statement.

Merge on mixed data types

The merge operation also supports mixed data types, such as arrays, maps, and structures.

Consider two tables mixed_types_source and mixed_types_target:

Table 5. mixed_types_source
doc_id user_info
1 {"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"}
2 {"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}
3 {"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"}
4 {"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"}

And:

Table 6. mixed_types_target
id user_info
1 {"name":"Lexx","surname":"Comfuzer","age":31,"gender":"MALE"}
To merge mixed_types_source and mixed_types_target:
MERGE INTO mixed_types_target trg 
USING mixed_types_source src 
ON src.doc_id = old.doc_id 
WHEN MATCHED THEN UPDATE 
SET user_info = src.user_info 
WHEN NOT MATCHED THEN INSERT VALUES 
(src.doc_id, src.user_info);

The result is:

Table 7. Result of MERGE operator
id first_name
1 {"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"}
2 {"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}
3 {"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"}
4 {"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"}
It is important to note that you cannot update only a part of a complex structure field. For example, you have a structure stored as one field in a Hive table:
{"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}

You cannot update only the age field in the structure. You can only replace all values of the structure with new ones. See Understanding the UPDATE Statement for details.

Merge into external MapR Database JSON tables

The MERGE operator is also available for external MapR Database JSON tables. You can use the MERGE statement to insert and update values in external MapR database JSON table targets.

Merge into partitioned MapR Database JSON tables

Partitioned MapR Database JSON tables are not supported.

Merge into temporary MapR Database JSON tables

The MERGE operator is also available for temporary MapR Database JSON tables. Use temporary tables as target tables for merge. No additional syntax is needed.