Dataware for data-driven transformation

Develop Apache Spark Applications (Spark v2.1)


About this Course

Developers will learn how to create big data applications with Apache Spark for version 2.1. You will use Spark’s interactive shell to load and inspect data, then learn about the various modes for launching a Spark application. You will go on to build and launch a standalone Spark application using datasets and DataFrames. Later, you will learn to create a Spark Streaming Application, and use GraphFrame and MLib.

Duration : 3 days

What’s Covered in the Course

1: Apache Spark Essentials (DEV 360)
  • Describe Features of Apache Spark
  • Define Spark Components
  • Explain Spark Data Pipeline Use Cases
Lab Activities
    • No labs
2: Create Datasets
  • Define Data Sources, Structures, and Schemas
  • Create Datasets and DataFrames
  • Convert DataFrames into Datasets
Lab Activities
    • Load Data and Create Datasets Using Reflection
    • Word Count Using Datasets (Optional)
3: Apply Operations on Datasets
  • Apply Operations on Datasets
  • Cache Datasets
  • Create User Defined Functions (UDFs)
  • Repartition Datasets
Lab Activities
    • Explore SFPD Data
    • Create and Use UDFs
    • Analyze Data Using UDF and Queries
4: Build a Simple Apache Spark Application (DEV 361)
  • Define the Spark Program Lifecycle
  • Define SparkSession
  • Describe Ways to Launch Spark Applications
  • Launch a Spark Application
Lab Activities
    • Import and Configure Application Files
    • Complete, Package, and Launch the Application
5: Monitor Apache Spark Applications
  • Describe Logical and Physical Plans of Spark Execution
  • Use Spark Web UI to Monitor Spark Applications
  • Debug and Tune Spark Applications
Lab Activities
    • Use the Spark UI
    • Find Spark System Properties
6: Create an Apache Spark Streaming Application (DEV 362)
  • Describe Spark Streaming Architecture
  • Create a Spark Structured Streaming Application
  • Apply Operation on Streaming DataFrames
  • Define Windows Operations
Lab Activities
    • Load and Inspect Data using the Spark Shell
    • Use Spark Streaming with the Spark Shell
    • Build and Run a Streaming Application with SQL
    • Build and Run a Streaming Application with Windows and SQL
7: Use Apache Spark GraphFrames
  • Describe GraphFrame
  • Define Regular, Directed, and Property Graphs
  • Create a Property Graph
  • Perform Operations on Graphs
Lab Activities
    • Analyze Data with GraphFrames
8: Use Apache Spark MLlib
  • Describe Apache Spark MLlib Machine Learning Algorithms
  • Use Collaborative Filtering to Predict User Choice
Lab Activities
    • Load and Inspect Data Using Spark Shell
    • Use Spark to Make Movie Recommendations
    • Analyze a Simple Flight Example with Decision Trees