Understanding the MERGE Statement
You can use the MERGE
statement to efficiently perform record-level
INSERT
and UPDATE
operations within Hive tables.
The MERGE
statement can be a key tool of MapR cluster data management. It is based on ANSI-standard SQL.
MERGE
statement is described by the following scenarios: - Simple merge.maprdb.column.id is the join key
- Simple merge.maprdb.column.id is not the join key
- Deleting while merging
- Multiple source rows match a given target row (cardinality violation)
- Merge on mixed data types
- Merge into external MapR Database JSON tables
- Merge into partitioned MapR Database JSON tables
- Merge into temporary MapR Database JSON tables
Simple merge.maprdb.column.id
is the join
key
Consider merging the following example source and target tables:
id | first_name | last_name | age |
---|---|---|---|
001 | Dorothi | Hogward | 7777 |
002 | Alex | Bowee | 7777 |
088 | Robert | Dowson | 25 |
id | first_name | last_name | age |
---|---|---|---|
001 | John | Smith | 45 |
002 | Michael | Watson | 27 |
003 | Den | Brown | 33 |
MERGE into customer_db_json_target trg
USING customer_source src
ON src.id = trg.id
WHEN MATCHED THEN UPDATE SET age = src.age
WHEN NOT MATCHED THEN
INSERT VALUES (src.id, src.first_name, src.last_name, src.age);
id | first_name | last_name | age |
---|---|---|---|
001 | John | Smith | 7777 |
002 | Michael | Watson | 7777 |
003 | Den | Brown | 33 |
088 | Robert | Dowson | 25 |
id
column is inserted.Simple merge.maprdb.column.id
is not the join key
Merging when merge.maprdb.column
is not the join key is not recommended.
Deleting while merging
MERGE
statement. This example:
MERGE INTO customer_db_json old
USING customer_new new ON new.id = old.id
WHEN MATCHED AND old.age > 10 THEN DELETE;
Will raise the following exception:
Error: Error while compiling statement: FAILED: SemanticException Deletes are not supported for MapR DB JSON tables (state=42000,code=40000)
Multiple source rows match a given target row (cardinality violation)
Consider merging the two tables customer_db_json
and
customer_new
:
id | first_name | last_name | age |
---|---|---|---|
001 | John | Smith | 45 |
002 | Michael | Watson | 27 |
003 | Den | Brown | 33 |
And:
id | first_name | last_name | age |
---|---|---|---|
001 | Dorothi | Hogward | 77 |
001 | Dorothi | Hogward | 77 |
088 | Robert | Dowson | 25 |
MERGE
customer_new
and customer_db_json
:
MERGE INTO customer_db_json trg
USING customer_new src ON src.id = trg.id
WHEN MATCHED THEN UPDATE
SET first_name = src.first_name,
last_name = src.last_name
WHEN NOT MATCHED THEN INSERT VALUES
(src.id, src.first_name, src.last_name, src.age);
This example causes an exception because of duplicate values in the id
column in the customer_new
table:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating cardinality_violation(_col0)
To avoid cardinality violation, set hive.merge.cardinality.check=false
,
but in this case the result is unpredictable because there is no rule which defines the
order of duplicated data that will be inserted while using the MERGE
statement.
Merge on mixed data types
The merge operation also supports mixed data types, such as arrays, maps, and structures.
Consider two tables mixed_types_source
and mixed_types_target
:
doc_id | user_info |
---|---|
1 | {"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"} |
2 | {"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"} |
3 | {"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"} |
4 | {"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"} |
And:
id | user_info |
---|---|
1 | {"name":"Lexx","surname":"Comfuzer","age":31,"gender":"MALE"} |
mixed_types_source
and mixed_types_target
:
MERGE INTO mixed_types_target trg
USING mixed_types_source src
ON src.doc_id = old.doc_id
WHEN MATCHED THEN UPDATE
SET user_info = src.user_info
WHEN NOT MATCHED THEN INSERT VALUES
(src.doc_id, src.user_info);
The result is:
id | first_name |
---|---|
1 | {"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"} |
2 | {"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"} |
3 | {"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"} |
4 | {"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"} |
{"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}
You cannot update only the age
field in the structure. You can only
replace all values of the structure with new ones. See Understanding the UPDATE Statement for details.
Merge into external MapR Database JSON tables
The MERGE
operator is also available for external MapR Database JSON tables. You can use the MERGE statement to insert and
update values in external MapR database JSON table targets.
Merge into partitioned MapR Database JSON tables
Partitioned MapR Database JSON tables are not supported.
Merge into temporary MapR Database JSON tables
The MERGE
operator is also available for temporary MapR Database JSON tables. Use temporary tables as target tables for
merge. No additional syntax is needed.