Data Analysis with Apache Pig


About this Course

This course covers how to use Pig as part of an ETL process in a Hadoop cluster. The course begins with manipulating semi-structured raw data files in Pig, and using the grunt shell and the Pig Latin programming language. Once the raw data has been manipulated into structured tables, they will be exported from Pig and imported into Hive.

Duration : 1 day

What’s Covered in the Course

1: Introduction to Apache Pig**
  • Define Apache Pig
  • Describe How Apache Pig Fits in the Data Pipeline
  • Understand Data Types in Pig
Lab Activities
    • Connect to the Grunt Shell
2: Extract, Transform, and Load Data with Apache Pig**
  • Load Data into Pig Relations
  • Examine Data and Debug Scripts
  • Use FOREACH … GENERATE on DataStore Data for Use with Other Applications
Lab Activities
    • Load Data into Pig Relations
    • Examine Pig Relations
    • Basic Data Manipulations
    • Store Data
3: Manipulate Data in Apache Pi**
  • Subset Data
  • Combine Data
  • Manipulate Data
Lab Activities
    • Load and Filter Relations
    • Transform and Join Relations
    • Explore Data