Getting Started with Apache Spark
Download
PDF
What is Apache Spark
What is Spark?
Who Uses Spark?
What is Spark Used For?
How to Install Apache Spark
A Very Simple Spark Installation
Testing Spark
Apache Spark Architectural Overview
Development Language Support
Deployment Options
Storage Options
The Spark Stack
Resilient Distributed Datasets (RDDs)
API Overview
The Power of Data Pipelines
Benefits of Hadoop and Spark
Hadoop vs. Spark - An Answer to the Wrong Question
What Hadoop Gives Spark
What Spark Gives Hadoop
Solving Business Problems with Spark
Processing Tabular Data with Spark SQL
Sample Dataset
Computing User Profiles with Spark
Delivering Music
Spark Streaming Framework and Processing Models
The Details of Spark Streaming
The Spark Driver
Processing Models
Picking a Processing Model
Spark Streaming vs. Others
Performance Comparisons
Current Limitations
Putting Spark into Production
Breaking it Down
Spark and Fighter Jets
Learning to Fly
Assessment
Planning for the Coexistence of Spark and Hadoop
Advice and Considerations
Spark In-Depth Use Cases
Building a Recommendation Engine with Spark
Collaborative Filtering with Spark
Loading Data into Spark DataFrames
Explore and Query with Spark DataFrames
Using ALS with the Movie Ratings Data
Making Predictions
Evaluating the Model
Unsupervised Anomaly Detection with Spark
Machine Learning Library (MLlib) with Spark
Dissecting a Classic by the Numbers
Getting Started with Apache Spark Conclusion
Apache Spark Developer Cheat Sheet
Transformations (return new RDDs – Lazy)
Actions (return values – NOT Lazy)
Persistence Methods
Additional Transformation and Actions
Extended RDDs w/ Custom Transformations and Actions
Streaming Transformations
RDD Persistence
Shared Data
MLlib Reference
Other References
Previous
Next