MapR 5.0 Documentation : Creating MapR-DB Applications with C

MapR-DB includes a version of libhbase, a library of C APIs for creating and accessing Apache HBase tables. You can use libhbase with MapR-DB tables; however, MapR includes another library of C APIs – libMapRClient – that run more efficiently on MapR-DB and perform faster against MapR-DB tables.

As does libhbase, libMapRClient uses the following conventions:

  1. All data types are prefixed with 'hb_'.
  2. All exported functions are annotated with HBASE_API, prefixed with 'hb_' and named using the following convention: 'hb_<subject>_<operation>_[<object>|<property>]'
  3. All asynchronous APIs take a callback which is triggered when a request completes. This callback can be triggered in the caller's thread or in another thread. To avoid any potential deadlock or starvation, applications should not block in the callback routine.
  4. All callbacks take a void pointer for application developers to supply their own data. This void pointer is passed when callback is triggered.

No explicit batching is supported for asynchronous APIs. Also, there is no support for filters in the MapR 4.1 release.

It is the responsibility of applications to free up all backing data buffers. However, for asynchronous APIs, applications must wait before freeing buffers until after receiving callbacks or manipulating results.

For better performance of asynchronous APIs, libMapRClient does not copy data buffers that are allocated for mutations, gets, and scans. These buffers hold table names, name space identifiers, row keys, column-family names, and column names or qualifiers. Instead, libMaprClient temporarily takes ownership of the buffers and references them with pointers until the callback is triggered. 

Therefore, applications should not free memory buffers before receiving callbacks for mutations. Applications also should not free memory buffers before receiving results for gets and scans. If applications must read results, the applications should not free memory buffers until the results are destroyed.

 

libMapRClient makes the following changes:

Additional implemented C APIs

A number of functions that are labeled @NotYetImplemented in libhbase are implemented in libMapRClient. The APIs are in the following header files.

Your applications need include only the hbase.h header file. The header files are listed below only to show the changes made to them in libMapRClient.

admin.h

Two additional APIs that are defined in this header file have been implemented.

/**  
* Adds a column family to an HBase table.
* @returns 0 on success, an error code otherwise.
*/
HBASE_API int32_t
hb_admin_table_add_column_family(
      const hb_admin_t admin,      /* [in] HBaseClient handle */
      const char *name_space,      /* [in] Null terminated namespace, set to NULL
                                    *   for default namespace and for 0.94 version */
      const char *table_name,      /* [in] Null terminated table name */
      const hb_columndesc family); /* [in] New family column descriptor */
 
/**
* Modifies a column family of an HBase table.
* @returns 0 on success, an error code otherwise.
*/
HBASE_API int32_t
hb_admin_table_modify_column_family(
      const hb_admin_t admin,      /* [in] HBaseClient handle */
      const char *name_space,      /* [in] Null terminated namespace, set to NULL
                                    *   for default namespace and for 0.94 version */
      const char *table_name,      /* [in] Null terminated table name */
      const hb_columndesc family); /* [in] New family column descriptor */

get.h

Two additional APIs that are defined in this header file have been implemented.

/**
* Optional. Only columns with the specified timestamp will be included.
*/
HBASE_API int32_t
hb_get_set_timestamp(
     hb_get_t get,
     const int64_t ts);
/**
* Optional. Only columns with timestamp within the specified range will
* be included.
*/
HBASE_API int32_t
hb_get_set_timerange(
     hb_get_t get,
     const int64_t min_ts,
     const int64_t max_ts);

mutation.h

Four additional APIs that are defined in this header file have been implemented.

/**
* Creates a structure for increment operation and return its handle.
*/
HBASE_API int32_t
hb_increment_create(
     const byte_t *rowkey,
     const size_t rowkey_len,
     hb_increment_t *increment_ptr);
/**
* Add a column and the amount by which its value to be incremented
* to the increment operation.
*/
HBASE_API int32_t
hb_increment_add_column(
     hb_increment_t incr,
     const hb_cell_t *cell,
     const int64_t amount);
/**
* Creates an structure for append operation and return its handle.
*/
HBASE_API int32_t
hb_append_create(
     const byte_t *rowkey,
     const size_t rowkey_len,
     hb_append_t *append_ptr);
/**
* Add a column for the append operation.
*/
HBASE_API int32_t
hb_append_add_column(
     hb_append_t append,
     const hb_cell_t *cell);

Also, when you use the hb_delete_*() APIs, you must set timestamps to INT64_MAX to delete all versions of a column. In libhbase, you instead set timestamps to -1, as explained in the libhbase version of this header file.

Increment only data that is 8 bytes in length. Before making a put to increment a value, your application must convert the new 8-byte value to bigendian format with the htobe64 API. When getting an incremented value, your application must convert the returned value from bigendian format to the original 8-byte value. The application can do this with the be64toh API.

For Windows clients:

  • Use _byteswap_uint64() to convert to a bigendian value before a put.
  • Use _byteswap_uint64() to convert from bigendian value to an 8-byte value for a get.

The need for conversion does not apply to increment operations. It applies only to puts that increment values and to gets that get incremented values.

scanner.h

One additional API that is defined in this header file has been implemented.

/**
* Optional. Adds a column family and optionally a column qualifier to
* the hb_scanner_t object.
*/
HBASE_API int32_t
hb_scanner_add_column(hb_scanner_t scanner,
        const byte_t *family,
        const size_t  familyLen,
        const byte_t *qualifier,
        const size_t qualLen);

types.h

One new API is defined in this header file.

/**
 * Inline function to initialize a hb_cell_t
 */
inline int32_t
hb_cell_t_init(hb_cell_t *cell) {
  if (cell) {
    memset(cell, 0, sizeof(hb_cell_t));
    cell->ts = HBASE_LATEST_TIMESTAMP;
    return 0;
  }
  return EINVAL;
}

When an application puts a value into a cell and needs to specify that the value is the most recent version, the application must call hb_cell_t_init(), passing in the name on the cell object.

After creating a new cell object, call hb_cell_t_init(). The cell timestamp will be initialized with HBASE_LATEST_TIMESTAMP. At the time of the put, the current timestamp on the server will be used for the version.

If the application does not call hb_cell_t_init(), the timestamp for the version will non-deterministic.

C API for impersonation

libMapRClient also includes a new function in the connection.h header file: hb_connection_create_as_user().  This function provides support for impersonation, so that you can connect to a MapR cluster and access MapR-DB tables by using a specific username.

/**
 * Creates an hb_connection_t instance for a specific user and initializes its
 * address into the passed pointer.
 */
HBASE_API int32_t
hb_connection_create_as_user(
    const char *zk_quorum,            /* [in] NULL terminated, comma separated
                                       *   string of CLDB servers. e.g.
                                       *   "<server1[:port]>,...". If set to
                                       *   NULL, IP addresses for CLDB nodes will be
                                       *   taken from mapr-clusters.conf */
    const char *zk_root,              /* [in] Ignored for MapR-DB. */
    const char *user,                 /* [in] The user who is being
                                       *   impersonated */
    hb_connection_t *connection_ptr); /* [out] pointer to hb_connection_t */

The user that is passed with the hb_connection_create_as_user() API must have permissions on the tables that the application accesses. For example, to read from a table, the user must have the readperm permission. To write to a table, the user must have the writeperm permission. See Enabling Table Authorizations with Access Control Expressions.

For hb_connection_create() and hb_connection_create_as_user(), the standard C APIs for HBase require a list of ZooKeeper nodes. For MapR-DB, this list is interpreted as a list of CLDB nodes. The zk_root parameter is ignored. If zk_quorum is NULL, then the connection will be created to the default cluster that is listed in the mapr-clusters.conf file.