Understanding the INSERT INTO Statement

This section describes how to use the INSERT INTO statement to insert or overwrite rows in nested MapR Database JSON tables, using the Hive connector.

Note: The output shown in these examples is for illustration only; actual Hive CLI output varies, depending on your specific situation.

Single-row insert

You can use the INSERT INTO statement to insert a single table row into a nested MapR Database table using one of two methods.

For example, imagine that you have the following Hive MapR Database JSON table, nested_data_insert:

 CREATE TABLE nested_data_insert 
             ( 
                          entry STRING, 
                          num INT, 
                          postal_addresses MAP <STRING, 
                          struct <USER_ID:STRING,ADDRESS:STRING,ZIP:STRING,COUNTRY:STRING>> 
             ) 
             stored BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' tblproperties
             ( 
                          "maprdb.table.name" = "/nested_data_insert", 
                          "maprdb.column.id" = "entry" 
             );
  • You can insert the new row into your table by using a dummy table:
    WITH dummy_table AS
      (SELECT '001' AS KEY,
              '1' AS num,
              MAP ('Adam',
                   Named_struct ('user_id', '1', 'address', '3205 Woodlake ct', 'zip', '45040', 'country', 'Usa'),
                   'Wilfred',
                   Named_struct ('user_id', '2', 'address', '777 Brockton Avenue', 'zip', '34000', 'country', 'Ita')) AS postal_addresses)
    INSERT INTO nested_data_insert
    SELECT *
    FROM dummy_table;
  • Alternatively, you can insert the new row into your table by using a SELECT statement:
    INSERT INTO TABLE nested_data_insert
    SELECT '002',
           '2',
           MAP ('Bill',
                Named_struct ('user_id', '1', 'address', '328 Virginia Ave', 'zip', '54956', 'country', 'Bol'),
                'Stiv',
                Named_struct ('user_id', '2', 'address', 'Schererville', 'zip', '46375', 'country', 'Efi'));
After you insert data, you should verify that the data is inserted in both Hive and MapR Database JSON tables:
  • Verify the insertion into the Hive table by using the SELECT * FROM syntax.
    SELECT * FROM nested_data_insert;

    Sample output:

    Table 1. Result of the insert queries 1.1 and 1.2:
    entry num postal_address
      USER_ID ADDRESS ZIP COUNTRY
    001 1 Adam 1 3205 Woodlake ct 45040 Usa
        Wilfred 2 777 Brockton Avenue 34000 Ita
    002 2 Bill 1 328 Virginia Ave 54956 Bol
        Stiv 2 Schererville 46375 Efi
  • Verify the insertion into the MapR Database JSON table data using the find statement:
    find '/nested_data_insert'
                  
    {
      "Adam": {
          "user_id": "1",
          "address": "3205 Woodlake ct",
          "zip": "45040",
          "country": "Usa"
               },
      "Wilfred": {
          "user_id": "2",
          "address": "777 Brockton Avenue",
          "zip": "34000",
          "country": "Ita"
               }
            }
    {
       "Bill": {
          "user_id": "1",
          "address": "328 Virginia Ave",
          "zip": "54956",
          "country": "Bol"
            },
       "Stiv": {
          "user_id": "2",
          "address": "Schererville",
          "zip": "46375",
          "country": "Efi"
            }
    }

Multiple-row insert

Now imagine that you want to insert three rows of data into nested_data_insert.

  • You can insert the new rows into your table by using a dummy table:
    WITH dummy_table AS
      (SELECT '003' AS KEY,
              '3' AS num,
              MAP ('Rony',
                   Named_struct ('user_id', '1', 'address', '4333 Backer str', 'zip', '12311', 'country', 'Hun')) AS postal_addresses
       UNION ALL SELECT '004' AS KEY,
                        '4' AS num,
                        MAP ('Ivan',
                             Named_struct ('user_id', '1', 'address', '833 Bridle Avenue', 'zip', '95111', 'country', 'CA')) AS postal_addresses
       UNION ALL SELECT '005' AS KEY,
                        '5' AS num,
                        MAP ('Ivan',
                             Named_struct ('user_id', '1', 'address', '664 Devon Ave', 'zip', '92021', 'country', 'Tog')) AS postal_addresses)
    INSERT INTO nested_data_insert
    SELECT *
    FROM dummy_table;
  • Alternatively, you can insert the new rows into your table by using a SELECT statement:
    INSERT INTO TABLE nested_data_insert
    SELECT '006',
           '6',
           MAP ('Rony',
                Named_struct ('user_id', '1', 'address', '150 National City', 'zip', '91950', 'country', 'Hun'))
    UNION ALL
    SELECT '007',
           '7',
           MAP ('Tomason',
                Named_struct ('user_id', '1', 'address', '272 Ocean Circle' , 'zip', '92801', 'country', 'CA'))
    UNION ALL
    SELECT '008',
           '8',
           MAP ('Davin',
                Named_struct ('user_id', '1', 'address', '81 Augusta Ave', 'zip', '93905', 'country', 'CA'));
After you insert data, you should verify that the data is inserted in both Hive and MapR Database JSON tables:
  • Verify the insertion into the Hive table by using the SELECT * FROM syntax.
    SELECT * FROM nested_data_insert WHERE entry > '002' ;

    Sample output:

    Table 2. Result of the insert queries 2.1 and 2.2:
    entry num postal_address
      USER_ID ADDRESS ZIP COUNTRY
    003 3 Rony 1 4333 Backer str 12311 Hun
    004 4 Ivan 1 833 Bridle Avenue 95111 CA
    005 5 Ivan 1 664 Devon Ave. 92021 Tog
    006 6 Rony 1 150 National City 91950 Hun
    007 7 Tomason 1 272 Ocean Circle

    92801

    CA
    008 8 Davin 1 81 Augusta Ave 93905 CA
  • Verify the insertion into the MapR Database JSON table data using the find statement:
    find '/nested_data_insert'
                    
    {
       "_id": "003",
       "num": {
         "$numberInt": 3
              },
       "postal_addresses": {
            "Rony": {
                 "address": "4333 Backer str",
                 "country": "Hun",
                 "user_id": "1",
                 "zip": "12311"
                 }
             }
          }
    {
       "_id": "004",
       "num": {
           "$numberInt": 4
              },
       "postal_addresses": {
              "Ivan": {
                  "address": "833 Bridle Avenue",
                  "country": "CA",
                  "user_id": "1",
                  "zip": "95111"
                }
           }
     }
     {
       "_id": "005",
       "num": {
               "$numberInt": 5
              },
       "postal_addresses": {
               "Ivan": {
                   "address": "664 Devon Ave",
                   "country": "Tog",
                   "user_id": "1",
                    "zip": "92021"
              }
           }
        }
    {
         "_id": "006",
         "num": {
               "$numberInt": 6
                },
         "postal_addresses": {
              "Rony": {
                 "address": "150 National City",
                 "country": "Hun",
                 "user_id": "1",
                 "zip": "91950"
               }
            }
        }
    {
        "_id": "007",
        "num": {
             "$numberInt": 7
               },
        "postal_addresses": {
              "Tomason": {
                  "address": "272 Ocean Circle",
                  "country": "CA",
                  "user_id": "1",
                  "zip": "92801"
               }
           }
        }
    {
        "_id": "008",
        "num": {
             "$numberInt": 8
               },
        "postal_addresses": {
               "Davin": {
                    "address": "81 Augusta Ave",
                    "country": "CA",
                    "user_id": "1",
                    "zip": "93905"
                }
             }
    }

Overwriting data

Still using sample table nested_data_insert, you can use the INSERT statement on a dummy table to overwrite one or more complete rows.

For example, to overwrite the first row in nested_data_insert (001) with new values, use the following syntax:
WITH dummy_table AS
(SELECT '001' AS KEY,
'1' AS num,
MAP ('newAdam',
Named_struct ('user_id', '1', 'address', 'newAdress', 'zip', 'newZip', 'country', 'newCountry')) AS postal_addresses)
INSERT INTO nested_data_insert
SELECT *
FROM dummy_table;
After you overwrite data, you should verify that the data is changed in both Hive and MapR Database JSON tables:
  • Verify the data into the Hive table by using the SELECT * FROM syntax.
    hive> SELECT * FROM nested_data_insert WHERE entry = '001';

    Sample output:

    Table 3. Result of the insert query 3.1:
    entry num postal_address
      USER_ID ADDRESS ZIP COUNTRY
    001 1 newAdam 1 newAddress newZip newCountry
  • Verify the data in the MapR Database JSON table data using the findbyid statement:
    findbyid '/nested_data_insert' --id 001
                  
    {
       "_id": "001",
       "num": {
       "$numberInt": 1
               },
       "postal_addresses": {
            "newAdam": {
                  "address": "newAdress",
                  "country": "newCountry",
                  "user_id": "1",
                  "zip": "newZip"
               }
           }
    } 
For another example, imagine that you want to overwrite 003 and 004 rows in nested_data_insert with new values:
WITH dummy_table AS (
SELECT '003' AS KEY,
'3' AS num,
MAP ('newName1',
Named_struct ('user_id', '1', 'address', 'newAdress1', 'zip', 'newZip1', 'country', 'newCountry1')) AS postal_addresses
UNION ALL
SELECT '004' AS KEY,
'4' AS num,
MAP ('newName2',
Named_struct ('user_id', '1', 'address', 'newAdress2', 'zip', 'newZip2', 'country', 'newCountry2')) AS postal_addresses)
INSERT INTO nested_data_insert
SELECT * FROM dummy_table;
After you overwrite the data, you should verify that the data is changed in both Hive and MapR Database JSON tables.
  • Verify the data in the Hive table by using the SELECT * FROM syntax.
    hive> SELECT * FROM nested_data_insert WHERE entry IN ('003', '004');

    Sample output:

    Table 4. Result of the insert query 3.3:
    entry num postal_address
      USER_ID ADDRESS ZIP COUNTRY
    003 3 newName1 1 newAddress1 newZip1 newCountry1
    004 4 newName2 1 newAddress2 newZip2 newCountry2
    Verify the data in the MapR Database JSON table data using the findbyid statement:
    findbyid '/nested_data_insert' --id 003
    {
      "_id": "003",
      "num": {
           "$numberInt": 3
              },
      "postal_addresses": {
      "newName1": {
           "address": "newAdress1",
           "country": "newCountry1",
           "user_id": "1",
           "zip": "newZip1"
               }
            }
         }
    findbyid '/nested_data_insert' --id 004
         {
           "_id": "004",
           "num": {
              "$numberInt": 4
              },
           "postal_addresses": {
               "newName2": {
                  "address": "newAdress2",
                  "country": "newCountry2",
                  "user_id": "1",
                  "zip": "newZip2"
               }
            }
        }
    Warning: If you exclude columns both from the SELECT statement in your INSERT statement and from the table schema, the value of this column changes to NULL.
Finally, imagine that you want to overwrite the first row in nested_data_insert (001) with new values and overwrite the num column to NULL:
WITH dummy_table AS
(SELECT '001' AS KEY,
MAP ('newAdam',
Named_struct ('user_id', '1', 'address', 'newAdress', 'zip', 'newZip', 'country', 'newCountry')) AS postal_addresses)
INSERT INTO nested_data_insert (entry, postal_addresses)
SELECT * FROM dummy_table;
After you overwrite data, you should verify that the data is changed in both Hive and MapR Database JSON tables.
  • Verify the data in the Hive table by using the SELECT * FROM syntax.
    hive> SELECT * FROM nested_data_insert WHERE entry = '001';

    Sample output:

    Table 5. Result of the insert query 3.5:
    entry num postal_address
      USER_ID ADDRESS ZIP COUNTRY
    001 NULL newAdam 1 newAddress newZip newCountry
  • Verify the data in the MapR Database JSON table (num row is not present):
    findbyid '/nested_data_insert' --id 001
                  
    {
       "_id": "001",
       "postal_addresses": {
       "newAdam": {
             "address": "newAdress",
             "country": "newCountry",
             "user_id": "1",
             "zip": "newZip"
            }
         }
    }