New API in Pig 0.16.0

Pig 0.16.0 includes the following new classes and interfaces.

New Classes

Class Description
org.apache.pig.piggybank.evaluation.string.REPLACE_MULTI REPLACE_MULTI implements eval function to replace all occurrences of search keys with replacement values. Replacement values are specified in Map. For example:
input_data = LOAD 'input_data' as (name); 
-- name = 'Hello World!' replaced_name = FOREACH input_data GENERATE REPLACE_MULTI ( name, [ ''#'_', '!'#'', 'e'#'a', 'o'#'oo' ] );
-- replaced_name = Halloo_Woorld 
The first argument is the source string on which REPLACE_MULTI operation is performed. The second argument is a map having search key with replacement value pairs. This is a pig loader that can load Apache HTTPD access logs written in (almost) any Apache HTTPD LogFormat.

Basic usage: Feed the loader your (custom) logformat specification and it will show the fields that can be extracted from this logformat.

org.apache.pig.CounterBasedErrorHandler Handles errors thrown by the StoreFuncInterface.putNext().
org.apache.pig.backend.hadoop.HKerberos Support for logging in using a Kerberos keytab file.

Kerberos is an authentication system that uses tickets with limited validity time. Running a Pig script on a Kerberos secured Hadoop cluster limits the running time to at most the remaining validity time of the Kerberos tickets. When doing really complex analytics, this may become a problem as the job may need to run for a longer time than these ticket times allow. A Kerberos keytab file is a Kerberos specific form of the password of a user. It is possible to enable a Hadoop job to request new tickets when they expire by creating a keytab file and making it part of the job that is running in the cluster. This will extend the maximum job duration beyond the maximum renew time of the Kerberos tickets.

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigWritableComparators Byte only raw comparators for faster comparison for non-orderby jobs. This does not reuse JobControlCompiler.Pig<DataType>WritableComparator, which extends PigWritableComparator. The is not that efficient in cases where tuple is iterated for null checking instead of taking advantage of TupleRawComparator.hasComparedTupleNull(). This also skips multi-query index checking.
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator This class is used to decorate the StoreFunc#putNext(Tuple). It handles errors by calling OutputErrorHandler#handle(String, long, Throwable) if the StoreFunc implements ErrorHandling.
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigInputFormatTez Extends org.apache.hadoop.mapreduce.InputFormat and implements Pig and Tez specific functions.
org.apache.pig.backend.hadoop.executionengine.tez.util.TezUDFContextSeparator Extends a visitor for the TezOperPlan class and serializes all (LoadFunc, StoreFunc, UserFunc). For historical reasons, Pig supports .bz and .bz2 for bzip2 extension. This class returns the additional bzip2 file extension, .bz, as a string.
org.apache.pig.impl.util.UDFContextSeparator TezUDFContextSeparator extends PhyPlanVisitor, which is the visitor class for the Physical Plan. To use this, create the visitor with the plan to be visited. Call the visit() method to traverse the plan in a depth first fashion.

This class also visits the nested plans inside the operators. Extend this class to modify the nature of each visit and to maintain any relevant state information between the visits to two different operators.

org.apache.pig.parser.RegisterResolver Resolves a JAR with a scripting language or namespace. Makes a list of URIs of the downloaded JARs.

New Interfaces

Interface Description
org.apache.pig.ErrorHandler The interface that handles errors thrown by StoreFuncInterface.putNext(Tuple).
org.apache.pig.ErrorHandling The interface to enable handling of errors during StoreFunc#putNext(Tuple).