Query and Store Data with Apache Hive


About this Course

This course begins with a review of SQL-on-Hadoop tools, then covers how to create, load, query, and manipulate tables in Hive. You will learn how to use Hive to query structured data without writing MapReduce code. You will learn how Apache Hive fits in the Hadoop ecosystem, how to create and load tables in Hive, and how to query data using the Hive Query Language. Together with DA 450 - Transform Data with Apache Pig, you can learn how to use Pig and Hive as part of a single data flow in a Hadoop cluster.

What’s Covered in the Course

1: Hive in the Hadoop Ecosystem
  • Hive Use Cases
  • Steps in the Data Pipeline
  • Hive in the Hadoop Ecosystem
  • Data Types Use With Hive
Lab Activities
    • Connect to the Hive CLI
    • Cast Data
2: Create and Load Data
  • Create Databases and Internal Tables
  • Create External Tables and Partitioned Tables
  • Load Data Into Tables and Databases
  • Alter and Drop Tables
Lab Activities
    • Create a Database
    • Create a Simple Table
    • Create Partitioned and External Tables
    • Load Data Into Tables
    • Examine Databases and Tables
3: Query and Manipulate Data
  • Query, Sort, and Filter Data
  • Manipulate Data With User-defined Functions
  • Combine and Store Tables
Lab Activities
    • Query Data With SELECT
    • Query Data With UDFs
    • Combine and Store Data