High Performance C APIs on MapR Database

Contributed by

8 min read

C Vs. Java APIs

Native languages like C/C++ provide a tighter control on memory and performance characteristics of the application than languages with automatic memory management. A well written C++ program that has intimate knowledge of the memory access patterns and the architecture of the machine can run several times faster than a Java program that depends on garbage collection. For these reasons, many enterprise developers with massive scalability and performance requirements tend to use C/C++ in their server applications in comparison to Java. Thus, the need to provide C APIs for MapR Database.

With the 4.1 release of the MapR Distribution, we have extended the libMapRClient library to allow allow users to write applications in C or C++ that can efficiently interact with MapR Database. A paradoxical side-effect is that widely used dynamic languages such as Javascript and Python also benefit from efficient C access as well even though they are not normally viewed as high performance languages.


A C language API for HBase known as libHBase was released in March last year (https://github.com/mapr/libhbase). This implementation leveraged the AsyncHBase Java library to interact with the HBase cluster. Since MapR Database supports AsyncHBase as well as the synchronous HBase API’s, anyone can use libHBase to talk to MapR Database as well as to HBase. The libHBase APIs are much faster than the HBase Thrift APIs, but they still incur a serious penalty due to embedding Java code in a C program because this embedding forces data to be copied from C data structures to Java. Even worse, since the MapR database client performs RPCs in native code, applications that use MapR Database incur this penalty twice, because data must be copied multiple times. The figure below shows how this happens.

C API - MapR Database

The motivation for this project was to bypass the Java layer completely and directly encode the user application data into RPC buffers by calling into the MapR native database client from C directly. The following figure shows how this eliminates the need to cross the JNI barrier twice.

C API libMapRClient


  • No JVM is spawned
  • No JNI (Java Native Interface) overhead imposed on the application
  • No duplication of data buffers needed to transition between Java and C land
  • No garbage collection uncertainty
  • Tighter control on memory and CPU usage

Asynchronous Architecture

The MapR Database C APIs are asynchronous in nature which means that calls return instantly, even before any results are received. The alternative is to make all calls wait for completion. Our experience, and that of many others, is that the use of RPC calls that block until completion are a serious impediment to high performance at scale. This was the original reason for the introduction of the AsyncHBase API library. If an application requires a synchronous API, it is very easy to write synchronous wrappers on the asynchronous methods (just invoke the method and wait for the callback). It is much more difficult to convert a synchronous API into a performant asynchronous API.

A practical impact of this is that all methods that can result in an RPC must accept a callback parameter as an argument.

As an example, here is the core API point for any operation that mutates data. The cb argument is the callback and the mutation argument is where the actual operation is specified.

HBASE_API int32_t


hb_client_t client,

hb_mutation_t mutation,

hb_mutation_cb cb,

void *extra);

The following figure shows how the API’s work internally.

C APIs - Hbase and MapR Database

When these asynchronous methods are invoked, a work item is created and queued for processing on the client side. This work item will be picked up as soon as possible by one of the threads in a thread pool. When responses to RPC calls are received the callback will be invoked by the thread pool.

Client applications are often faster than RPC calls, so we need to make sure that the queue of work items does not grow without bound. For this reason, we have a config parameter fs.mapr.pool.queue.max_size (default 10000) which controls the maximum size of the work item queue. This parameter can be modified by updating the /opt/mapr/conf/dbclient.conf file.

C APIs - Hbase and MapR Database

Whenever the work item queue size reaches this limit, the library return ENOBUFS errors for the asynchronous calls. The client application is expected to handle this error, and can decide retry invoking the asynchronous call after some time. Another option is to pass a shared global condition variable to all callbacks via the extra argument so that the callbacks can signal the condition variable as they complete. The completion of any pending callback is a likely indication that the ENOBUFS condition has been cured and an operation should be retried.


Performance was our foremost goal when we started working on this project. On that front, our implementation has the following characteristics:

  • The library does not copy any of the user application allocated buffers. It rather just maintains references to it. These buffers are then directly encoded into RPC buffers. Thus, the library expects that the user application gives up the ownership of these buffers till the time the callback is invoked. Once the callback is invoked, ownership is returned to the application so that these buffers can be destroyed or re-used as appropriate.
  • These config parameters that can be tuned by the client application to trade throughput versus resource usage.

Our new API has a number of important features that are not available in libHBase:

  • Secure user impersonation while creating connection

  • Adding or modifying column families of a table

  • Setting timestamps or time range for get and scan operations

  • Increment and append mutations

  • Filtering on column family and column name in scan operations

  • For MapR versions >= v5.0: HBase thrift language compliant filter support for get/scan operations (http://hbase.apache.org/0.94/book/thrift.html)

Learn more about creating native applications for MapR Database here: http://doc.mapr.com/display/MapR/Creating+MapR-DB+Applications+with+C

Getting Started

To help you get started quickly, we have added two sample applications as part of the installation package. You can find them under /opt/mapr/examples directory when you install mapr-client. These applications are also located in a github repository here: https://github.com/mapr-demos/c-api-sample-applications

Learn more about the sample applications here:

In this blog post, you’ve learned about high performance C APIs on MapR Database. If you have any questions, please add your comments in the section below..

Want to learn more?

This blog post was published July 17, 2015.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now