MapR 5.0 Documentation : Get Started with Pig

In this tutorial, we'll use Pig to run a MapReduce job that counts the words in the file /in/constitution.txt in the mapr user's directory on the cluster, and store the results in the file wordcount.txt.

  • First, download the file: select Tools > Attachments on this Confluence page (see the top-right corner of the page) and right-click constitution.txt to save it.
  • Load the file onto the cluster and place it in the directory /user/mapr/in.

Perform the following steps: 

  1. In the terminal, type the command pig to start the Pig shell.
  2. At the grunt> prompt, type the following lines (press ENTER after each):

    A = LOAD '/user/mapr/in' USING TextLoader() AS (words:chararray);
    B = FOREACH A GENERATE FLATTEN(TOKENIZE(*));
    C = GROUP B BY $0;
    D = FOREACH C GENERATE group, COUNT(B);
    STORE D INTO '/user/mapr/wordcount';

    After you type the last line, Pig starts a MapReduce job to count the words in the file constitution.txt.

  3. When the MapReduce job is complete, type quit to exit the Pig shell and take a look at the contents of the directory /myvolume/wordcount to see the results.

Attachments:

constitution.txt (text/plain)