As of Pig 0.14, MapR supports the Hive ORC storage format for reading and writing data. This section presents simple examples of how to use ORC storage. For more details, see the Pig documentation.
ORC is typically used to read (load) and write (store) data as follows:
<VAR_NAME> = load '/path/to/orc/formatted/file' using OrcStorage();
store <VAR_NAME> into '/path/to/output/orc/file' using OrcStorage('');
You use the grunt shell in Pig to execute these commands.
The following examples show how to:
Create an ORC format file in MapR-FS by storing the data in a Hive table and uploading it to Pig
You can create an ORC format file in MapR-FS by using Hive to load a text file into a table with ORC storage. Then, you can upload the resulting ORC format file to Pig.
Create a sample test data file:
chown mapr:mapr test_pig.data
Add data to the file.
Do not include any extra lines at the end of the file.
Upload the test data to a Hive table:
sudo -u mapr hive
hive> create table test_pig(first_name string, last_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
hive> load data local inpath '/home/mapr/test_pig.data' overwrite into table test_pig;
Create a Hive table with ORC storage:
hive> create table test_pig_orc(first_name string, last_name string) stored as orc tblproperties ("orc.compress"="NONE");
hive> insert overwrite table test_pig_orc select * from test_pig;
hive> select * from test_pig_orc;
Check that the ORC file was created:
hadoop fs -ls /user/hive/warehouse/test_pig_orc
Upload the ORC file to Pig:
sudo -u mapr pig
grunt> B = load '/user/hive/warehouse/test_pig_orc/000000_0'
grunt> dump B;
Upload a text file to MapR-FS and use Pig to save it as an ORC format file
Upload the file to MapR-FS:
sudo -u mapr hadoop fs -put ./test_pig.data /test_pig.data
sudo -u mapr hadoop fs -mkdir /output
Start Pig and save the text file in ORC format:
sudo -u mapr pig
grunt> A = LOAD '/test_pig.data' using PigStorage(',') AS (first_name:chararray, last_name:chararray);
grunt> store A into '/output/A' using OrcStorage('');
Verify that the ORC file was created: