MapR 5.0 Documentation : Use ORC Storage with Pig

As of Pig 0.14, MapR supports the Hive ORC storage format for reading and writing data. This section presents simple examples of how to use ORC storage. For more details, see the Pig documentation.

ORC is typically used to read (load) and write (store) data as follows:

<VAR_NAME> = load '/path/to/orc/formatted/file' using OrcStorage(); 
store <VAR_NAME> into '/path/to/output/orc/file' using OrcStorage('');

You use the grunt shell in Pig to execute these commands.

The following examples show how to:

Create an ORC format file in MapR-FS by storing the data in a Hive table and uploading it to Pig

You can create an ORC format file in MapR-FS by using Hive to load a text file into a table with ORC storage. Then, you can upload the resulting ORC format file to Pig.

  1. Create a sample test data file:

    cd /home/mapr
    nano test_pig.data
    chown mapr:mapr test_pig.data
  2. Add data to the file.

    Example
    John,Smith
    Brian,May
    Rodger,Taylor
    John,Deacon
    Max,Plank
    Freddie,Mercury
    Albert,Einstein
    Fedor,Dostoevsky
    Lev,Tolstoy
    Niccolo,Paganini

    Do not include any extra lines at the end of the file. 

  3. Upload the test data to a Hive table:

    sudo -u mapr hive
    hive> create table test_pig(first_name string, last_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
    hive> load data local inpath '/home/mapr/test_pig.data' overwrite into table test_pig;
  4. Create a Hive table with ORC storage:

    hive> create table test_pig_orc(first_name string, last_name string) stored as orc tblproperties ("orc.compress"="NONE");
    hive> insert overwrite table test_pig_orc select * from test_pig;
    hive> select * from test_pig_orc;
  5. Check that the ORC file was created:

    hadoop fs -ls /user/hive/warehouse/test_pig_orc
  6. Upload the ORC file to Pig:

    sudo -u mapr pig
    grunt> B = load '/user/hive/warehouse/test_pig_orc/000000_0' 
    using OrcStorage();
    grunt> dump B;

Upload a text file to MapR-FS and use Pig to save it as an ORC format file

  1. Upload the file to MapR-FS:

    cd /home/mapr
    sudo -u mapr hadoop fs -put ./test_pig.data /test_pig.data
    sudo -u mapr hadoop fs -mkdir /output
  2. Start Pig and save the text file in ORC format:

    sudo -u mapr pig
    grunt> A = LOAD '/test_pig.data' using PigStorage(',') AS (first_name:chararray, last_name:chararray);
    grunt> store A into '/output/A' using OrcStorage('');
  3. Verify that the ORC file was created:

    hadoop fs -ls /output/A/