Big Data Essentials


About this Course

This course series introduces students to the basics of big data computing, the Apache Hadoop ecosystem, and the MapR Data Platform. Covered are big data concepts and how different tools and roles can help solve real-world big data problems. At the end of the course, students will have some of the fundamental knowledge necessary for other MapR Academy courses.

Duration: 1 day

What’s Covered in the Course

1: Introduction to Big Data (ESS 100)
  • Define Big Data
  • Summarize the History of Big Data Computing
  • Define Key Terms in Big Data Computing
2: The Big Data Pipeline
  • Organize the Steps in the Big Data Pipeline
  • Explain the Role of Administrators
  • Explain the Role of Developers
  • Explain the Role of Data Analysts and Data Scientists
3: Core Elements of Apache Hadoop (ESS 101)
  • Compare and Contrast Local and Distributed File Systems
  • Explain Data Management in the Hadoop File System
  • Summarize the MapReduce Algorithm
4: The Apache Hadoop Ecosystem
  • Define the Apache Hadoop Ecosystem
  • Administration: Apache ZooKeeper, YARN
  • Ingestion: Apache Flume, Apache Oozie, Apache Sqoop
  • Processing: Apache Spark, Apache HBase, Apache Pig
  • Analysis: Apache Hive, Apache Drill, Apache Mahout
5: Solving Big Data Problems
  • Data Warehouse Optimization
  • Recommendation Engine
  • Large-Scale Log Analysis
6: Introduction to the MapR Data Platform (ESS 102)
  • Define the MapR Data Platform
  • Explain Key Features and Benefits of the MapR Data Platform
  • Understand Use Cases for the MapR Data Platform