Run Hive Jobs with Oozie

Complete the following steps to configure Oozie to submit Hive jobs:
  1. (Optional) Update the Hive shared libraries. By default, Oozie ships with shared libraries for a specific Hive version. To update the shared libraries with the version of Hive that you are running, complete the following steps:
    1. Stop Oozie.
      maprcli node services -name oozie -action stop -nodes <space delimited list of nodes>
    2. Remove Hive libraries from <OOZIE_HOME>/share2/lib/hive/.
      rm -rf /opt/mapr/oozie/oozie-<version>/share2/lib/hive/hive-*
      rm -rf /opt/mapr/oozie/oozie-<version>/share2/lib/hive/jline-*
    3. As of Oozie 4.2.0-1510, also remove Hive libraries from /share1/lib/hive/:
      rm -rf /opt/mapr/oozie/oozie-/share1/lib/hive/hive-*
      rm -rf /opt/mapr/oozie/oozie-/share1/lib/hive/jline-*
    4. Copy the following JAR files from <HIVE_HOME>/lib/ to <OOZIE_HOME>/share2/lib/hive/:
      hive-ant 
      hive-cli
      hive-common
      hive-contrib
      hive-exec
      hive-metastore
      hive-serde
      hive-service
      hive-shims
      hive-shims-0.20
      hive-shims-0.20S
      hive-shims-0.23
      hive-shims-common
      hive-shims-common-secure
      Example:
      cp /opt/mapr/hive/hive-<version>/lib/{hive-ant*.jar,hive-cli*.jar,hive-common*.jar,hive-contrib*.jar,hive-exec*.jar,hive-metastore*.jar,hive-serde*.jar,hive-service*.jar,hive-shims*.jar} /opt/mapr/oozie/oozie-<version>/share2/lib/hive/
      cp /opt/mapr/hive/hive-<version>/lib/jline-* /opt/mapr/oozie/oozie-<version>/share2/lib/hive/
    5. As of the Oozie 4.2.0-1510, also copy the following jar files from /lib/ to /share1/lib/hive/:
      hive-ant 
      hive-cli
      hive-common
      hive-contrib
      hive-exec
      hive-metastore
      hive-serde
      hive-service
      hive-shims
      hive-shims-0.20
      hive-shims-0.20S
      hive-shims-0.23
      hive-shims-common
      hive-shims-common-secure
      Example Command
      cp /opt/mapr/hive/hive-/lib/{hive-ant*.jar,hive-cli*.jar,hive-common*.jar,hive-contrib*.jar,hive-exec*.jar,hive-metastore*.jar,hive-serde*.jar,hive-service*.jar,hive-shims*.jar} /opt/mapr/oozie/oozie-/share1/lib/hive/
      cp /opt/mapr/hive/hive-/lib/jline-* /opt/mapr/oozie/oozie-/share1/lib/hive/
    6. Start Oozie.
      maprcli node services -name oozie -action start -nodes <space delimited list of nodes>
      Note: If high availability is enabled for Oozie, perform steps a through e on all nodes where Oozie is installed.
    7. As of Oozie 4.1.0-1601 and Oozie 4.2.0-1601, if the oozie.service.WorkflowAppService.system.libpath property in oozie-site.xml does not use the default value (/oozie/share/lib), you must perform the following steps to update the shared libraries:
      1. Based on the cluster MapReduce mode, run one of the following commands to copy the new Oozie shared libraries to MapR-FS:
        Cluster MapReduce Mode Command
        YARN
        sudo -u mapr {OOZIE_HOME}/bin/oozie-setup.sh sharelib create -fs maprfs:/// -locallib /opt/mapr/oozie/oozie-<version>/share2
        Classic
        sudo -u mapr {OOZIE_HOME}/bin/oozie-setup.sh sharelib create -fs maprfs:/// -locallib /opt/mapr/oozie/oozie-<version>/share1
      2. Run the following command to update the Oozie classpath with the new shared libraries:
        sudo -u mapr {OOZIE_HOME}/bin/oozie admin -sharelibupdate
  2. (Optional) Configure Hive to use the metastore server.
    1. To use a metastore server for the Hive job, add the following parameter to the hive-site.xml file:
      <property>
       <name>hive.metastore.uris</name>
       <value>thrift://<IP address>:<port></value>
       <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
      </property>
  3. Configure a Hive workflow. As of Oozie 4.2.0-1508, you can configure Oozie to perform a workflow by connecting to Hive Metastore or Hiveserver2. Previously, Oozie could only submit jobs to Hive Metastore.
    Configure a Hive Workflow with Connection to Hive Metastore
    1. Copy the edited hive-site.xml file to the same location as your workflow.xml file.
    2. Edit the workflow.xml file to include the following:
      1. Specify the hive-site.xml in the job-xml parameter.
      2. Specify the name of the script (for example, script.q) that contains the hive query in the script parameter.
      3. Optionally, add properties used by the Oozie launcher job. Add the prefix oozie.launcher to the property names.
      <workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf">
          <start to="hive-node"/>  
          <action name="hive-node">
              <hive xmlns="uri:oozie:hive-action:0.2">
                  <job-tracker>${jobTracker}</job-tracker>
                  <name-node>${nameNode}</name-node>
                  <prepare>
                      <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive"/>
                      <mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
                  </prepare>
                  <job-xml>hive-site.xml</job-xml>
                  <configuration>
                      <property>
                          <name>mapred.job.queue.name</name>
                          <value>${queueName}</value>
                      </property>
                  </configuration>
                  <script>script.q</script>
                  <param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/table</param>
                  <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/hive</param>
              </hive>
              <ok to="end"/>
              <error to="fail"/>
          </action>
        
          <kill name="fail">
              <message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
          </kill>
          <end name="end"/>
      </workflow-app>
    Configure a Hive Workflow with Connection to HiveServer2
    1. Copy the edited hive-site.xml file to the same location as your workflow.xml file.
    2. Edit the workflow.xml file to include the following:
      1. Specify the JDBC URL used by Beeline for connections to Hiveserver2 in the jdbc-url element. See Connecting to HiveServer2 for details.
      2. Specify the name of the script (for example, script.q) that contains the hive query in the script element.
        <?xml version="1.0" encoding="UTF-8"?>
        <workflow-app xmlns="uri:oozie:workflow:0.5" name="hive2-wf">
            <start to="hive2-node"/>
            <action name="hive2-node">
                <hive2 xmlns="uri:oozie:hive2-action:0.1">
                    <job-tracker>${jobTracker}</job-tracker>
                    <name-node>${nameNode}</name-node>
                    <prepare>
                        <delete path="${nameNode}/user/${wf:user()}/output-data/hive2"/>
                        <mkdir path="${nameNode}/user/${wf:user()}/output-data"/>
                    </prepare>
                    <configuration>
                        <property>
                            <name>mapred.job.queue.name</name>
                            <value>${queueName}</value>
                        </property>
                    </configuration>
                    <jdbc-url>jdbc:hive2://localhost:10000/default</jdbc-url>
                    <script>script.q</script>
                    <param>INPUT=/user/${wf:user()}/input-data/table</param>
                    <param>OUTPUT=/user/${wf:user()}/output-data/hive2</param>
                </hive2>
                <ok to="end"/>
                <error to="fail"/>
            </action>
            <kill name="fail">
                <message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
            </kill>
            <end name="end"/>
        </workflow-app>