Understanding the MERGE Statement

This section describes how to use the MERGE statement to efficiently perform record-level INSERT and UPDATE operations within Hive tables.

The MERGE statement can be a key tool of MapR cluster data management. It is based on ANSI-standard SQL.

Simple merge.maprdb.column.id is the join key

Consider two tables customer_source and customer_db_json_target. See Table 1 for source table customer_source:

Table 1. Source table customer_source
id first_name last_name age
001 Dorothi Hogward 7777
002 Alex Bowee 7777
088 Robert Dowson 25

See Table 2 for target table customer_db_json_target:

Table 2. Target table customer_db_json_target
id first_name last_name age
001 John Smith 45
002 Michael Watson 27
003 Den Brown 33
The following SQL statement is an example of valid MERGE usage:
MERGE into customer_db_json_target trg 
USING customer_source src 
ON src.id = trg.id 
WHEN MATCHED THEN UPDATE SET age = src.age 
WHEN NOT MATCHED THEN 
INSERT VALUES (src.id, src.first_name, src.last_name, src.age);
The result after merge into target is:
Table 3. Result of MERGE operator
id first_name last_name age
001 John Smith 7777
002 Michael Watson 7777
003 Den Brown 33
088 Robert Dowson 25
Note: The age column is updated and a new id column is inserted.

Simple merge.maprdb.column.id is not the join key

This type of join is not recommended. Consider two tables client_mapr_db_json_target and client_source:

Table 4. Contents of client_mapr_db_json_target
id first_name last_name age
1 John Robin 32
2 Martin Maxx 40
3 Lee Harwin 25

And:

Table 5. Contents of customer_source
id first_name last_name age
111 FN1 LN1 32
222 FN2 LN2 40
3 FN3 LN3 11
Using the MERGE operator when maprdb.column.id is not the join key:
MERGE INTO client_mapr_db_json_target trg 
USING client_source src 
ON trg.age = src.age 
WHEN MATCHED THEN UPDATE 
SET first_name = src.first_name, 
last_name  = src.last_name
WHEN NOT MATCHED THEN INSERT VALUES 
(src.id, src.first_name, src.last_name, src.age);

The result after merge into target is:

Table 6. Result of MERGE operator
id first_name last_name age
1 FN1 LN1 32
2 FN2 LN2 40
3 FN3 FN3 11
Here the join key is column is age. Two values of age (32 and 40) are the same in tables client_mapr_db_json_target and client_source. You want to update columns first_name and last_name if age is the same, and insert the row if age differs. You expect row (3, FN3, FN3, 11) to be inserted into client_mapr_db_json_target table, but the table already has a row with id 3:
(3, Lee, Harwin, 25)
Since MapR Database JSON tables do not allow two rows with the same merge.maprdb.column.id, row (3, Lee, Harwin, 25) is replaced with (3, FN3, FN3, 11), even if the MERGE operator consists of WHEN NOT MATCHED THEN INSERT VALUES and you may expect it to be inserted.

Deleting while merging

Deletions are not supported in a MERGE statement. This example:
MERGE INTO  customer_db_json old 
USING customer_new new ON new.id = old.id 
WHEN MATCHED AND old.age > 10 THEN DELETE;
Will raise the following exception:
Error: Error while compiling statement: FAILED: SemanticException Deletes are not supported for MapR DB JSON tables (state=42000,code=40000)

Multiple source rows match a given target row (cardinality violation)

Consider two tables customer_db_json and customer_new:

Table 7. Contents of customer_db_json
id first_name last_name age
001 John Smith 45
002 Michael Watson 27
003 Den Brown 33

And:

Table 8. Contents of customer_new
id first_name last_name age
001 Dorothi Hogward 77
001 Dorothi Hogward 77
088 Robert Dowson 25
To MERGE customer_new and customer_db_json:
MERGE INTO customer_db_json trg 
USING customer_new src ON src.id = trg.id 
WHEN MATCHED THEN UPDATE 
SET first_name = src.first_name, 
last_name = src.last_name 
WHEN NOT MATCHED THEN INSERT VALUES
(src.id, src.first_name, src.last_name, src.age);

This example causes an exception because of duplicate values in the id column in the customer_new table:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating cardinality_violation(_col0)

To avoid cardinality violation, set hive.merge.cardinality.check=false, but in this case the result is unpredictable because there is no rule which defines the order of duplicated data that will be inserted while using the MERGE statement.

Merge on mixed data types

The MERGE operation also supports mixed data types, such as arrays, maps, and structures.

Consider two tables mixed_types_source and mixed_types_target:

Table 9. Contents of mixed_types_source
doc_id user_info
1 {"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"}
2 {"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}
3 {"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"}
4 {"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"}

And:

Table 10. Contents of mixed_types_target
id user_info
1 {"name":"Lexx","surname":"Comfuzer","age":31,"gender":"MALE"}
To MERGE mixed_types_source and mixed_types_target:
MERGE INTO mixed_types_target trg 
USING mixed_types_source src 
ON src.doc_id = old.doc_id 
WHEN MATCHED THEN UPDATE 
SET user_info = src.user_info 
WHEN NOT MATCHED THEN INSERT VALUES 
(src.doc_id, src.user_info);

The result after merge into target is:

Table 11. Result of MERGE operator
id first_name
1 {"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"}
2 {"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}
3 {"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"}
4 {"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"}
It is important to note that you cannot update only a part of a complex structure field. For example, you have a structure stored as one field in a Hive table:
{"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}

You cannot update only the age field in the structure. You can only replace all values of the structure with new ones. See Understanding the UPDATE Statement for details.

Merge into external MapR Database JSON tables

The MERGE operator is also available for external MapR Database JSON tables. Use external tables as target tables for merge. No additional syntax is needed.

Merge into partitioned MapR Database JSON tables

Partitioned MapR Database JSON tables are not supported.

Merge into temporary MapR Database JSON tables

The MERGE operator is also available for temporary MapR Database JSON tables. Use temporary tables as target tables for merge. No additional syntax is needed.