DEV 362 - Create Data Pipelines Using Apache Spark

Register Now

About this Course

This course teaches you how to build data pipeline applications using Spark streaming, Spark SQL, Spark GraphX and MLlib. You’ll learn about spark streaming architecture, data pipeline use cases, DStreams, and property graph operations.

This is the third course in the Apache Spark v1.6 Series from MapR. The update for Apache Spark v2.1 is coming soon.

What’s Covered

Course Lessons Lab Activities
7: Introduction to Apache Spark Data Pipelines
Components of Apache Spark Unified Stack
Benefits of Apache Spark Over Hadoop Ecosystem
Data Pipeline Use Cases
8: Create an Apache Spark Streaming Application
Spark Streaming Architecture
DStreams and a Spark Streaming Application
Operations on DStream
Window Operations
Fault-tolerant Streaming Applications
Build a Streaming Application that Writes to HBase
Build a Streaming Application with SQL
Build a Streaming Application with Windows and SQL
9: Use Apache Spark GraphX
GraphX overview
Regular, Directed and Property Graphs
Property Graph Creation
Operations on Graphs
Create a Property Graph
Apply Graph Operations
10: Use Apache Spark MLlib
Spark MLlib Overview
Machine Learning Techniques for Classification, Clustering, and Collaborative Filtering
Collaborative Filtering to Predict User Choice
Load and Inspect Data using the Spark Shell
Use the Spark MLlib to Make Movie Recommendations

Get Certified

This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.

Prerequisites

  • Completion of the on-demand courses: ESS 100-102, and DEV 360 - 361 of the Apache Spark v1.6 series
  • Basic to intermediate Linux knowledge
  • Experience using a Linux text editor such as vi and Linux commands like mv, cp, ssh, grep, cd, and useradd
  • Knowledge of application development principles, functional programming and Scala or Python
  • Basic fluency with SQL