Dataware for data-driven transformation

Big Data Essentials


About this Course

This course series introduces students to the basics of big data computing, the Apache Hadoop ecosystem, and the MapR Converged Data Platform. Covered are big data concepts and how different tools and roles can help solve real-world big data problems. At the end of the course, students will have some of the fundamental knowledge necessary for other MapR Academy courses.

Duration: 1 day

What’s Covered in the Course

1: Introduction to Big Data (ESS 100)
  • Define Big Data
  • Summarize the History of Big Data Computing
  • Define Key Terms in Big Data Computing
Lab Activities
    • No labs
2: The Big Data Pipeline
  • Organize the Steps in the Big Data Pipeline
  • Explain the Role of Administrators
  • Explain the Role of Developers
  • Explain the Role of Data Analysts
Lab Activities
    • No labs
3: Solving Big Data Problems
  • Data Warehouse Optimization
  • Recommendation Engine
  • Large-Scale Log Analysis
Lab Activities
    • No labs
4: Core Elements of Apache Hadoop (ESS 101)
  • Compare and Contrast Local and Distributed File Systems
  • Explain Data Management in the Hadoop File System
  • Summarize the MapReduce Algorithm
Lab Activities
    • No labs
5: The Apache Hadoop Ecosystem
  • Define the Apache Hadoop Ecosystem
  • Administration: Apache ZooKeeper, YARN
  • Ingestion: Apache Flume, Apache Oozie, Apache Sqoop
  • Processing: Apache Spark, Apache HBase, Apache Pig
  • Analysis: Apache Hive, Apache Drill, Apache Mahout
Lab Activities
    • No labs
6: Introduction to the MapR Converged Data Platform (ESS 102)
  • Define the MapR Converged Data Platform
  • Explain Key Features and Benefits of the MapR Converged Data Platform
  • Understand Use Cases for the MapR Converged Data Platform
Lab Activities
    • No labs