Partner App: Tungsten Replicator

Tungsten Replicator

Tungsten Replicator 3.0 provides real-time replication functionality from MySQL and Oracle into Hadoop, translating row changes from your transactional store into change data within Hadoop. The can either be used as change information, or materialised into a view of the data that created carbon-copy tables within Hive.

Application Description

Tungsten Replicator reads data from the transaction SQL store using either the binary log (MySQL) or Oracle Change Data Capture (CDC), converting the individual transactions into row-based events. This row data is then replicated into HDFS by creating change data within a CSV format. Tungsten Replicator includes DDL translation tools to convert MySQL or Oracle DDL into Hive format, and a materialization process that translates the source transactional table data into corresponded table data within Hive. Tungsten Replicator is open source software. These separate Hadoop specific tools are provided through Github

Component Version Connection Method
MapR Distribution 3.0,3.1 HDFS API
Hive 0.11+ HiveQL Interface
Application Version: 3.0

Download App

Installation instructions

A basic outline for installation is:

  1. Install the master replicator to extract information from your transactional store.
  2. Install the slave replicator to apply data into HDFS within your Hadoop cluster.
  3. Download the Hadoop tools from https://github.com/continuent/continuent-tools-hadoop/
    This tool provides 5 separate elements of functionality:
    a. Generates staging table DDL within Hive
    b. Generates live table DDL within Hive
    c. Generates a suitable Sqoop statement to provision any existent data.
    d. Performs a materialisation of the tables from the change data into the carbon copy tables
    e. Performs a data comparison, comparing the current live transactional table and Hive tables.

Full details on the process are documented in the documentation: https://docs.continuent.com/tungsten-replicator-3.0/deployment-hadoop.html

Use Instructions

Tungsten Replicator is an active, background, process. As long as the replicator is running, and data changes are being logged in your master transactional database, replicator will continue to replicate data into Hadoop.

The materialisation process must be run regularly to translate the change data into carbon copy tables. This can be managed through a simple cron job or Oozie workflow.

https://www.youtube.com/watch?v=NSUfXeIoAmc&feature=youtu.be

Support Information

Support for open-source users and developers is provided through the mailing list.

Paid support options are available through the Continuent Support portal.

mc.brown@continuent.com