MapR 5.0 Documentation : Run Pig Jobs with Oozie

Complete the following steps to configure Oozie to run Pig jobs:

Update the Pig Shared Libraries (optional)

By default, Oozie ships with shared libraries for a specific Pig version. To update the shared libraries with the version of Pig that you are running, complete the following steps: 

  1. Stop Oozie:

    maprcli node services -name oozie -action stop -nodes <space delimited list of nodes>
  2. Remove all files located within the /opt/mapr/oozie/oozie<version>/share2/lib/pig*/ directory EXCEPT the oozie-sharelib-pig-<version>-mapr.jar file.
  3. As of Oozie 4.2.0-1501, also remove all files located within the /opt/mapr/oozie/oozie/share1/lib/pig*/ directory EXCEPT the oozie-sharelib-pig--mapr.jar file.
  4. Copy the pig-core and pig lib into the Oozie shared libraries for Pig:

    cp <PIG_HOME>/pig-core-h2.jar <OOZIE_HOME>/share2/lib/pig/
    cp <PIG_HOME>/pig-core-h2.jar <OOZIE_HOME>/share2/lib/pig-2/
    cp <PIG_HOME>/lib/* <OOZIE_HOME>/share2/lib/pig/
    cp <PIG_HOME>/lib/* <OOZIE_HOME>/share2/lib/pig-2/ 
  5. As of Oozie 4.2.0-1510, also copy the pig-core and pig lib into the Oozie share1 libraries folder for Pig:

    cp /pig-core-h2.jar /share1/lib/pig/
    cp /pig-core-h2.jar /share1/lib/pig-2/
    cp /lib/* /share1/lib/pig/
    cp /lib/* /share1/lib/pig-2/ 
  6. Remove the zookeeper jars and h1 directories:

    rm -rf <OOZIE_HOME>/share2/lib/pig/h1 <OOZIE_HOME>/share2/lib/pig/zookeeper*.jar 
    rm -rf <OOZIE_HOME>/share2/lib/pig-2/h1 <OOZIE_HOME>/share2/lib/pig/zookeeper*.jar
  7. As of Oozie 4.2.0-1510, also remove the zookeeper jars and h1 directories from the Oozie share1 libraries folder:

    rm -rf /share1/lib/pig/h1 /share1/lib/pig/zookeeper*.jar
    rm -rf /share1/lib/pig-2/h1
    /share1/lib/pig/zookeeper*.jar
  8. Start Oozie:

    maprcli node services -name oozie -action start -nodes <space delimited list of nodes>

     

If high availability is enabled for Oozie, repeat steps 2 through 7 on all nodes where Oozie is installed.

Configure a Pig Workflow

Edit the workflow.xml file to include the following:

  1. Specify the shared library with the oozie.action.sharelib.for.pig property. With MapR distribution versions 4.0.0 and later, set this property to pig-2.
  2. Optionally, specify the name of the script (for example, id.pig) that contains the Pig query in the script parameter.
Example Workflow
<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
    <start to="pig-node"/>
    <action name="pig-node">
        <pig>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/user/${wf:user()}/output-data/pig"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
		        <property>
		            <name>oozie.action.sharelib.for.pig</name>
		            <value>pig-2</value>
		        </property>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
            <script>id.pig</script>
            <param>INPUT=/user/${wf:user()}/input-data/text</param>
            <param>OUTPUT=/user/${wf:user()}/output-data/pig</param>
        </pig>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Pig failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>