Transform Data with Apache Pig


About this Course

This course covers how to use Pig to analyze structured data without writing MapReduce code. It starts with a review of data pipeline tools, then covers how to load, manipulate relations and use UDFs in relations in Pig. Together with DA 440 – Query and Store Data with Apache Hive, you will learn how to use Pig and Hive as part of a single data flow in a Hadoop cluster.

What’s Covered in the Course

1: Pig in the Hadoop Ecosystem
  • Use Cases of Pig
  • Steps in the Data Pipeline
  • Data Types Used in Pig
Lab Activities
    • Connect to the Grunt Shell
2: Extract, Transform, and Load Data
  • Load Data into Relations
  • Debug Pig Scripts
  • Perform Simple Manipulations
  • Save Relations as Files
Lab Activities
    • Load Data into Pig Relations
    • Examine Pig Relations
    • Basic Data Manipulations
    • Store Data
3: Manipulate Data
  • Subset Relations
  • Combine Relations
  • Use UDFs on Relations
Lab Activities
    • Load and Filter Relations
    • Transform and Join Relations
    • Explore Data