DEV 362 - Create Data Pipelines Using Apache Spark

Register Now

With MapR Academy Pro, you get advanced courses, lab exercises, custom sandboxes, quizzes, interactivity, and course certificates available by going Pro.

Learn more about going Pro.

About this Course

DEV 362 describes the benefits of the Apache Spark unified platform and how to build data pipeline application using Spark streaming, Spark SQLSpark GraphX and MLlib. The concepts are taught using scenarios in Scala that also form the basis of hands-on labs.


  • Completion of ESS 100, ESS 101
  • Basic Hadoop knowledge and intermediate linux knowledge
  • Experience using a text editor such as vi
  • Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
  • Knowledge of functional programming with Scala or Python, and experience with SQL


This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.


Lesson 7 - Introduction to Apache Spark Data Pipelines

  • Identify Spark Unified Stack Components
  • List Benefits of Apache Spark Unified Stack Over Hadoop Eco-system
  • Describe Spark Data Pipeline Use Cases

Lesson 8 - Create an Apache Spark Streaming Application

  • Describe Spark Streaming Architecture
  • Create DStreams and a Spark Streaming Application
  • Lab 8: Create a Spark Streaming Application
  • Apply Operations on DStreams
  • Define Windowed Operations
  • Describe How Streaming Applications are Fault-Tolerant

Lesson 9 - Use Apache Spark GraphX to Analyze Flight Data

  • Describe GraphX
  • Define Regular, Directed, and Property Graphs
  • Create a Property Graph
  • Perform Operations on Graphs
  • Lab 9: Use Apache Spark GraphX

Lesson 10 - Use Apache Spark MLlib to Predict Flight Delays

  • Describe Apache Spark MLlib
  • Describe the Machine Learning Techniques
  • Use Collaborative Filtering to Predict User Choice
  • Lab 10: Use Apache Spark MLlib to Make Recommendations