HParser is a data transformation (data handler) environment optimized for Hadoop. This easy-to-use, codeless parsing software enables processing of any file format inside Hadoop with scale and efficiency. It provides Hadoop developers with out-of-the-box Hadoop parsing capabilities to address the variety and complexity of data sources, including logs, industry standards, documents, and binary or hierarchical data.

MapR has partnered with Informatica to provide the Community Edition of HParser:

  • The HParser package can be downloaded from Informatica as a Zip archive that includes the HParser engine, the Data Transformation HParser Jar file, HParser Studio, and the HParser Operator Guide.
  • The HParser engine is also available as an RPM via the MapR repository, making it easier to install the HParser Engine on all nodes in the cluster.

HParser can be installed on a MapR cluster running CentOS or Red Hat Enterprise Linux.

Installing HParser on a MapR Cluster

  1. Register on the Informatica site.
  2. Download the Zip file containing the Community Edition of HParser, and extract it.
  3. Familiarize yourself with the installation procedure in the HParser Operator Guide.
  4. On each node, install HParser Engine from the MapR repository by typing the following command as root or with sudo:
    yum install hparser-engine
  5. Choose a Command Node, a node in the cluster from which you will issue HParser commands.
  6. Following the instructions in the HParser Operator Guide, copy the HParser Jar file to the Command Node and create the HParser configuration file.