DA 450 - Transform Data with Apache Pig

Register Now

About this Course

This course covers how to use Pig to analyze structured data without writing MapReduce code. It starts with a review of data pipeline tools, then covers how to load, manipulate relations and use UDFs in relations in Pig.

Together with DA 440 – Query and Store Data with Apache Hive, you will learn how to use Pig and Hive as part of a single data flow in a Hadoop cluster.

What’s Covered

Course Lessons Lab Activities
1: Pig in the Hadoop Ecosystem
Use Cases of Pig
Steps in the Data Pipeline
Data Types Used in Pig
Connect to the Grunt Shell
2: Extract, Transform, and Load DataLoad Data into Relations
Debug Pig Scripts
Perform Simple Manipulations
Save Relations as Files
Load Data into Pig Relations
Examine Pig Relations
Basic Data Manipulations
Store Data
3: Manipulate DataSubset Relations
Combine Relations
Use UDFs on Relations
Load and Filter Relations
Transform and Join Relations
Explore Data

Get Certified

This course is part of the preparation needed for the MapR Certified Data Analyst (MCDA) certification exam.

Prerequisites

  • Completion of the MapR Academy on-demand courses: ESS 100 - 102
  • Basic Hadoop knowledge
  • Terminal program installed; familiarity with command-line navigation