This course covers how to use Pig to analyze structured data without writing MapReduce code. It starts with a review of data pipeline tools, then covers how to load, manipulate relations and use UDFs in relations in Pig.
Together with DA 440 – Query and Store Data with Apache Hive, you will learn how to use Pig and Hive as part of a single data flow in a Hadoop cluster.
|Course Lessons||Lab Activities|
|1: Pig in the Hadoop Ecosystem
Use Cases of Pig
Steps in the Data Pipeline
Data Types Used in Pig
|Connect to the Grunt Shell|
|2: Extract, Transform, and Load DataLoad Data into Relations
Debug Pig Scripts
Perform Simple Manipulations
Save Relations as Files
|Load Data into Pig Relations
Examine Pig Relations
Basic Data Manipulations
|3: Manipulate DataSubset Relations
Use UDFs on Relations
|Load and Filter Relations
Transform and Join Relations
This course is part of the preparation needed for the MapR Certified Data Analyst (MCDA) certification exam.