MapR 5.0 Documentation : Support for HBase Java Filters by the MapR-DB C APIs

The hb_get_set_filter() and hb_scanner_set_filter() APIs filter the results of GET and SCAN operations. Their signatures are:

int32_t hb_get_set_filter(hb_get_t get, const byte_t *filter, const int32_t filterLen);
int32_t hb_scanner_set_filter(hb_scanner_t scanner, const byte_t *filter, const int32_t filterLen);

They both take filters that are passed as strings, as well as the length of these strings. MapR-DB parses the strings to construct filters.

Examples of using hb_get_set_filter() and hb_scanner_set_filter()

Both examples use this array of filters:

static char filters[][200] = {"RandomRowFilter(0.5)",
                             "ColumnCountGetFilter(2)",
                             "ColumnPaginationFilter(1)",
                             "ColumnPrefixFilter('column-a')",
                             "FamilyFilter(=,'binaryprefix:f')",
                             "PrefixFilter('row_') AND QualifierFilter(<,'binaryprefix:g')",
                             "SKIP TimestampsFilter(1392222222222)",
                             "WHILE ValueFilter(=,'binaryprefix:cell2_value_v1')",
                             "FuzzyRowFilter('row00','00001')",
                             "TimestampsFilter(1430937732000,1431024132000)"};
}

Example of filtering the results of a GET operation 

{
    bytebuffer rowKey = bytebuffer_strcpy("row_with_two_cells");
    hb_get_t get = NULL;
    hb_get_create(rowKey->buffer, rowKey->length, &get);
    hb_get_add_column(get, FAMILIES[0], 1, NULL, 0);
    hb_get_add_column(get, FAMILIES[1], 1, NULL, 0);
    hb_get_set_table(get, table_name, table_name_len);
    hb_get_set_num_versions(get, 10); // up to ten versions of each column
    hb_get_set_filter(get, (byte_t *)filters[9], strlen(filters[9]));
    get_done = false;
    hb_get_send(client, get, get_callback, rowKey);
    wait_for_get();
 }

Example of filtering the results of a SCAN operation

  for (uint32_t i = 0; i < num_filters; ++i) {
    hb_scanner_t scanner = NULL;
    hb_scanner_create(client, &scanner);
    hb_scanner_set_table(scanner, table_name, table_name_len);
    hb_scanner_set_num_max_rows(scanner, 3);  // maximum 3 rows at a time
    hb_scanner_set_num_versions(scanner, 10); // up to 10 versions of the cell
    hb_scanner_set_filter(scanner, (byte_t *)filters[i], strlen(filters[i]));
    hb_scanner_next(scanner, scan_callback, NULL); // dispatch the call
    wait_for_scan();
 }

Filter Format and Arguments

Filters are specified in the Thrift Filter Language and are in this format: FilterName (argument, argument,... , argument). Arguments that represent strings are enclosed in single quotation marks (‘). Arguments that represent booleans, integers, or comparison operators (<, <=, =, !=, >, >=) are not enclosed in single quotation marks.

Binary Operators

You can combine filters by using the binary operators AND and OR. For example, PrefixFilter ('Row') AND PageFilter (1) AND FirstKeyOnlyFilter () returns all key-value pairs that match the following conditions:

  • The row containing the key-value must start with the prefix "Row".

  • The key-value must be located in the first row of the table.

  • The key-value must be the first key-value pair in the row.

For another example, (RowFilter (=, 'binary:Row 1') AND TimeStampsFilter (74689, 89734)) OR ColumnRangeFilter ('abc', true, 'xyz', false)) returns all key-value pairs that

Match both of the following conditions:

    • The key-value is in a row for which the row key is "Row 1".

    • The key-value has a timestamp of either 74689 or 89734.

Or match this condition:

    • The key-value is located in a column that is lexicographically greater than or equal to "abc" and less than "xyz".

Unary Operators

You can also use the following unary operators with filters:

SKIP

For a particular row, if any of the key-values don’t pass the filter condition, the entire row is skipped.

For example, SKIP ValueFilter (0) omits rows in which any values are not 0.


WHILE

Rows are tested in order against the filter condition. Rows that meed the condition are included in the result set. When a row fails to meet the condition, filter processing stops and no more rows are tested.

Evaluation of Filters

When filters are combined with the binary operators,  unary operators, or both, they are evaluated according to these rules:

  1. First, evaluate the contents of parentheses.

  2. Next, evaluate filters that use unary operators. Both SKIP and WHILE operators have the same precedence.

  3. Finally, evaluate filters that use the binary operators. AND has higher precedence than OR.

For example, in a filter of the form Filter1 AND Filter2 OR Filter, Filter1 AND Filter2 is evaluated and the result is X. Then, X OR Filter3 is evaluated.

For another example, a filter of the form Filter1 AND SKIP Filter2 OR Filter3 is evaluated in these steps:

  1. Evaluate SKIP Filter2 with the result being X.

  2. Evaluate Filter1 AND X with the result being Y.

  3. Evaluate Y OR Filter3.

Compare Operators and Comparators for Comparison Filters

The comparison filters DependentColumnFilter, FamilyFilter, QualifierFilter, RowFilter, and ValueFilter use the following syntax:

filter(<compareOperator>, <comparatorType:Value>)

Compare Operators

The following compare operators are supported: <,<=, =, !=, >, >=

Comparators

There are four comparators:

ComparatorDescription
BinaryComparator

This comparator lexicographically compares against the specified byte array .

Values are byte arrays.

For example, binary:abc matches values that are lexicographically greater than "abc”.

BinaryPrefixComparator

This comparator lexicographically compares against a specified byte array. It only compares up to the length of this byte array.

Values are byte arrays.

For example, binaryprefix:abc matches values in which the first 3 characters are lexicographically equal to "abc"

RegexStringComparator

This comparator compares against the specified byte array using the given regular expression.

You can use only the = and != compare operators with this comparator.

Values are regular expressions.

For example, regexstring:ab*yz matches values that begin with "ab" and end with "yz”.

SubStringComparator

This comparator tests whether the given substring appears in a specified byte array. The comparison is case insensitive.

You can use only the = and != compare operators with this comparator.

Values are strings.

For example, substring:abc123 matches values that contains the substring "abc123”.

Supported Filters

 

FilterFormatDescription
ColumnCountGetFilterColumnCountGetFilter(x)Returns the first x columns in a row. Used for GET operations.
ColumnPaginationFilterColumnPaginationFilter(x,y)Returns the first x columns after the number y of columns that is specified for the offset.
ColumnPrefixFilterColumnPrefixFilter('prefix')Returns only those key-values in columns that have names that start with the specified prefix. The column prefix must be of the form “qualifier”.
ColumnRangeFilterColumnRangFilter('minColumn','maxColumn',boolean,boolean)Returns only those key-values that are in columns that have names between minColumn and maxColumn.
For example, if minColumn is 'an', and maxColumn is 'be', the filter returns key-values from columns named 'ana', 'bad', but not from columns named 'bed' or'eye' If minColumn is null, there is no lower bound. If maxColumn is null, there is no upper bound.
This filter also takes two boolean variables to indicate whether to include the minColumn and maxColumn.
DependentColumnFilterDependentColumnFilter('family','qualifier')Tries to locate the specified column in each row and returns all key-values that have the same timestamp in that column. If a row does not contain the specified column, none of the key-values in that row are returned.
FamilyFilterFamilyFilter(compareOperator,'comparator:value')Filters by column family. If the comparison returns true, the filter returns all of the key-values in the matching column family.
FirstKeyOnlyFilterFirstKeyOnlyFilter()Returns the first key-value from each row.
FirstKeyValueMatchingQualifiersFilterFirstKeyValueMatchingQualifier('qualifier_1', 'qualifier_2',...'qualifier_n')Serially compares each qualifier in a row with the given qualifiers. If the current qualifier matches any of the given qualifiers, the filter stops and includes the current row (up to the current qualifier) in the result set.
FuzzyRowFilterFuzzyRowFilter(‘rowkey’,’fuzzy_info’)Filters data based on fuzzy row key. Performs fast-forwards during scanning. It takes pairs (row key, fuzzy info) to match row keys. Where fuzzy info is a byte array with 0 or 1 as its values:
0 - means that this byte in provided row key is fixed, i.e. row key's byte at same position must match
1 - means that this byte in provided row key is NOT fixed, i.e. row key's byte at this position can be different from the one in provided row key
Example: Let's assume row key format is userId_actionId_year_month. Length of userId is fixed and is 4, length of actionId is 2 and year and month are 4 and 2 bytes long respectively. Let's assume that we need to fetch all users that performed certain action (encoded as "99") in Jan of any year. Then the pair (row key, fuzzy info) would be the following: row key = "????_99_????_01" (one can use any value instead of "?") fuzzy info = "\x01\x01\x01\x01\x00\x00\x00\x00\x01\x01\x01\x01\x00\x00\x00" I.e. fuzzy info tells the matching mask is "????_99_????_01", where at ? can be any value.
InclusiveStopFilterInclusiveStopFilter('rowKey')Returns all key-values that are in the rows up to and including the specified row that has the specified row key.
KeyOnlyFilterKeyOnlyFilter()Returns the key component of each key-value.
MultipleColumnPrefixFilter

MultipleColumnPrefixFilter('prefix_1','prefix_2', ...,'prefix_n')

Returns the key-values from columns that have names that begin with any of the specified prefixes.
PageFilterPageFilter(pageSize)Returns the number of rows that is equivalent to the specified page size..
PrefixFilterPrefixFilter('rowKey_prefix')Returns the key-values from a row that has a key which starts with the specified row-key prefix.
QualifierFilterQualifierFilter(compareOperator,'comparator:value')Filters by column. If the comparison returns true, the filter returns all of the key-values in the matching column.
RandomRowFilterRandomRowFilter(probability)Filters by probability. For example, RandomRowFilter(0.25) means that there is a 1 in 4 chance that the filter will pick the first row, a 1 in 4 chance that the filter will pick the next row, and so on until all rows in the table have been processed in this way.
RowFilterRowFilter(compareOperator,'comparator:value')Filters by row key. If the comparison returns true, the filter returns all of the key-values in the matching row.
SingleColumnValueExcludeFilterSingleColumnValueExcludeFilter('columnFamily','qualifier', compareOperator,'comparator:value')This filter takes the same arguments and behaves the same as SingleColumnValueFilter: however, if the column is found and the condition passes, all of the columns of the row will be returned except for the tested column value.
SingleColumnValueFilterSingleColumnValueFilter('columnFamily','qualifier', compareOperator,'comparator:value')This filter takes a column family, a qualifier, a compare operator and a comparator. If the specified column is not found: all of the columns of that row will be emitted. If the column is found and the comparison with the comparator returns true, all of the columns of the row will be emitted. If the condition fails, the row will not be returned.
SkipFilterSKIP filterSee the description of the SKIP unary operator above.
TimeStampsFilterTimeStampsFilter('timestamp_1','timestamp_2',..., 'timestamp_n')Returns the key-values that have timestamps that match any of the specified timestamps.
ValueFilterValueFilter(compareOperator,'comparator:value')Filters by key-value. If the comparison returns true, the filter returns the matching key-value.
WhileMatchFilterWHILE filterSee the description of the WHILE unary operator above.