Introduction to Apache Spark (Spark v2.1)


About this Course

Developers will learn to build simple Spark applications for Apache Spark version 2.1. You will use Spark’s interactive shell to load and inspect data, then learn about the various modes for launching a Spark application. Also covered are working with DataFrames, datasets, and User-Defined Functions (UDFs).

What’s Covered in the Course

1: Introduction to Apache Spark
  • Describe Features of Apache Spark
  • Define Spark Components
  • Explain Spark Data Pipeline Use Cases
Lab Activities
    • No Lab
2: Create Datasets
  • Define Data Sources, Structures, and Schemas
  • Create Datasets and DataFrames
  • Convert DataFrames into Datasets
Lab Activities
    • Load Data and Create Datasets Using Reflection
    • Bonus Lab: Word Count Using Datasets (Optional)
3: Apply Operations on Datasets
  • Apply Operations on Datasets
  • Cache Datasets
  • Create User Defined Functions (UDFs)
  • Repartition Datasets
Lab Activities
    • Explore SFPD Data
    • Create and Use UDFs
    • Analyze Data Using UDF and Queries