Apache Sqoop

Hadoop users often want to perform analysis of data across multiple sources and formats, and a common source is a relational database or data warehouse. Sqoop allows users to efficiently move structured data from these sources into Hadoop for analysis and correlation with other data types, such as semi-structured and unstructured data stored in the distributed file system. Once analysis has been completed, Sqoop can be used to push any resulting structured data back into a database or data warehouse so it is available for operational use.

Sqoop relies on parallel processing for its efficiency, using all multiple cluster nodes simultaneously. It also provides an API for custom connectors to be built that integrate with new data sources. Sqoop is able to integrate out-of-the-box with popular relational databases and data warehouses, such as MySQL, Oracle, PostgreSQL, Teradata, and Netezza.


Installing Sqoop on MapR

Getting started with Sqoop on MapR

Release Notes for Sqoop on MapR

Apache Sqoop User Guide

Apache Sqoop Developer's Guide

Apache Sqoop Issue Tracker

Apache Sqoop Mailing Lists

Download Sandbox for Hadoop

GitHub - MapR

MapR Developer Central