Dataware for data-driven transformation

Robust Application Development for MapR Database

Contributed by

14 min read

This blog post is the second in a series on MapR's Enterprise-Ready Database for Analytics. The first blog post described the key requirements for a successful database for analytics. This post focuses on factors that are must-haves for robust application development.

One key factor is how you represent your data. The data you need to analyze in modern data processing applications has become more varied, more complex, and less structured. To meet your business requirements, it is no longer sufficient to store your data in a traditional relational database. JSON is becoming the de facto standard for modeling your data because of its expressiveness and its ability to handle evolving data schemas.

Applications also require multiple ways to access your data, whether through a programmatic language or a higher level language like SQL. The choice of language varies, depending on the requirements of your application ecosystem. You must be able to easily get your data in and out of the system at the application layer.

MapR Database meets all of these requirements with its support for storage and retrieval of JSON documents. This blog post introduces MapR Database JSON and describes how it enables you to write:

  • Multi-threaded middleware applications
  • Expressive queries on complex data
  • Applications in a wide variety of programming languages

JSON Documents - A Brief Overview

Before describing the specifics of MapR Database, for readers who are unfamiliar with the concept of JSON documents, this section provides a brief overview.

A JSON document is a tree of fields. Each field has a name, type, and value. The following is a simple example of a JSON document:

{
  "_id":"student1",    
  "name":"Alice",
  "street":"123 Ballmer Av",
  "Zipcode":12345,
  "state":"CA"
}

JSON documents provide a powerful way to represent your data in a hierarchical and nested fashion. They also provide schema flexibility, as you do not need to predefine the schema of a JSON document.

The Power and Flexibility of JSON Documents

You can represent complex, unstructured data in a JSON document as well as simple data. JSON documents consist of fields with the following data types:

Data Type Examples
Scalar data, including numbers, strings, and floating point values 10
"ten"
10.0
Arrays, which are sometimes referred to as lists of values [10, "ten", 10.0]
Nested documents, which are sometimes referred to as maps of key/value pairs
{
"_id" : "2DT3201",
"product_ID" : "2DT3201",
"name" : " Allegro SPD-SL 6800",
"brand" : "Careen",
"category" : "Pedals",
"type" : "Components",
"price" : 112.99,
}

Array elements can be nested documents as well as arrays. This enables you to represent data as multi-dimensional arrays. You can mix nested documents and arrays across multiple levels in a JSON document field. There is no limit on the number of nesting levels. These capabilities enable you to represent complex, unstructured data in a JSON document.

The following is a sample JSON document that contains fields with multiple nesting levels:

{ "_id": "account001", "projects": [  
     { "id": "proj001", "manager": { "name": "Guy Bones", "email": "gbones@pro.com"}, "customer": {  
           "name": "My Company",
           "contacts": [  
              { "id": "user_jdoe", "emails": [  
                 { "type": "work", "value": "jdoe@comp.com" },

               { "type": "personal", "value": "jdoe@gmail.com"}  
              ], "addresses": [  
                 { "type": "work", "value": {"street":"21 King Av","city": "Redwood", "zip": 94065, "state": "CA"}  
                 }  
              ], "phones": [  
                 { "type": "cell", "value": "+16505556764"},  
                 { "type": "office", "value": "+14075556764"}], "role": "CEO"},  
              { "id": "user_simsom", "emails": [  
                 { "type": "work", "value": "simson@comp.com"},  
                 { "type": "personal", "value": "simson@gmail.com"}  
              ], "addresses": [  
                 {  
                 "type": "work",  
                 "value": {"street":"21 King Av.","city":"Redwood","zip": 94065,"state": "CA"}  
                 }  
              ], "phones": [  
                 { "type": "cell", "value": "+16505556777"},  
                 { "type": "office", "value": "+1407555444"}], "role": "PM"  
              }  
           ]  
        }  
     }
   ]
}

Accessing JSON Document Fields

Field paths are what you use to access JSON document fields. A field path consists of the names of fields and subfields, using a nested dot notation. For arrays, you use square brackets with an array index number. You can mix these notations as required for your data model.

For example, using the JSON document from the previous section, the following are examples of field paths and the value returned by each:

Field Path Value Returned
_id
{
"_id" : "account001"
}
projects[0].manager
{
"projects" : [ {
"manager" : {
"email" : "gbones@pro.com",
"name" : "Guy Bones"
}
} ]
}
projects[0].customer.name,
projects[0].customer.contacts[0].id,
projects[0].customer.contacts[0].phones[0].value
{
"projects" : [ {
"customer" : {
"contacts" : [ {
"id" : "user_jdoe",
"phones" : [ {
"value" : "+16505556764"
} ]
} ],
"name" : "My Company"
}
} ]
}

Writing Applications to Access JSON Documents

MapR Database enables you to store and retrieve JSON documents. You can do this either through the SQL query language or using the Open JSON Application Interface (OJAI). Typically, you use SQL for analytic applications and queries, and OJAI for operational workloads. SQL provides rich semantics for analytics, and a programmatic API provides ability to write fast multi-threaded apps.

This blog post focuses on OJAI. MapR provides SQL access to MapR Database via a number of SQL tools, including Apache Drill, Apache Spark, and Apache Hive. Apache Drill is performance optimized. A subsequent blog post will cover MapR Database's integration with Apache Drill in more detail.

OJAI provides Document Object Model (DOM)-based APIs for creating, updating, and reading JSON documents. Using DOM-based APIs, you can build and read the entire JSON document in memory. For larger, more complex documents, OJAI also supports other

APIs that allows you to sequentially create and read document fields without having to keep the entire document in memory. This allows you to write higher performance applications because it minimizes data transfer between your application and MapR Database. For more information about the difference between these two API variations, see Interfaces for Working with Documents.

With a programmatic API like OJAI, you can write applications that support:

  • Asynchronous and synchronous processing
  • Multi-threaded applications
  • Large scale data processing
  • Operational analytics

Optimizing Query Performance with MapR Database

Optimal database performance is driven by reduced data processing and reduced I/O. One way MapR Database achieves this is by leveraging native secondary indexes. Secondary indexes improve performance by significantly reducing the amount of data MapR Database needs to read to process your queries. MapR Database automatically chooses the most optimal index by using a sophisticated, cost-based query optimizer. See Secondary Indexes for more information. A subsequent blog post will also provide further details.

Another way MapR Database improves performance is by pushing filters on your data into MapR Database, so only the documents you need are transferred to your application. OJAI supports unique features to facilitate this, including features that enable you to manipulate complex data types in a document.

Filtering JSON Document Fields with Query Conditions

To filter JSON documents from a table, you need to specify a query condition. The OJAI interface includes a syntax for specifying querying conditions. You can either express the conditions programmatically using OJAI API methods or in a JSON format using OJAI syntax.

In this blog post, we only describe the JSON format syntax. This is the general format of the syntax:

{"operator":{"fieldpath":value}}

OJAI supports the usual set of operators, including:

  • $eq, $gt, $ge, $lt, $le, $ne
  • $matches, $notmatches, $like, $notlike, $in, $notin
  • $and, $or

For a complete list of operators, see OJAI Query Condition Operators.

Using the same sample JSON document from earlier, the following query condition qualifies that document:

{"$eq":{"_id":"account001"}}

Likewise for the following:

{
 "$eq":{
  "projects[0].customer.contacts[0].phones[0].value":"+16505556764"
  }
}

Notice, however, that the field path explicitly references the first element of each array. Suppose you want to apply a condition where the matching value appears in any element of these arrays. Starting in MapR Database 6.1, to manipulate arbitrary array elements, you no longer have to enumerate through all possible array elements in your application, which was inconvenient, and in some cases impractical. Moreover, it required transferring the entire array to your application, which impacted performance.

Using Container Field Paths to Access Complex Data Type Fields

To address the dilemma alluded to in the previous section, MapR Database 6.1 introduces the notion of a container field path. Using a container field path, you can access a field that is any arbitrary array element. This enables you to write more expressive queries on complex types, which includes arrays of scalar types and arrays of nested documents.

To specify a container field path, place square brackets after the field name:

fieldName[]

To show how this works, let's take the examples from the earlier section but replace the explicit array element references with container field paths:

Field Path Value Returned
projects[].manager
{
"projects" : [ {
"manager" : {
"email" : "gbones@pro.com",
"name" : "Guy Bones"
}
} ]
}
projects[].customer.name,
projects[].customer.contacts[].id,
projects[].customer.contacts[].phones[].value
{
"projects" : [ {
"customer" : {
"contacts" : [ {
"id" : "user_jdoe",
"phones" : [ {
"value" : "+16505556764"
}, {
"value" : "+14075556764"
} ]
}, {
"id" : "user_simsom",
"phones" : [ {
"value" : "+16505556777"
}, {
"value" : "+1407555444"
} ]
} ],
"name" : "My Company"
}
} ]
}

Notice that the query result for the first field path is identical to the earlier one because the document has only one element in the projects array. However, for the second, the result includes all elements in the contacts and phones arrays.

Filtering Complex Data Type Fields

Container field paths are especially beneficial for filtering. Extending the earlier example to use a container field path instead of explicit array indexes, we can locate the matching phone number without having to know which array elements it belongs to:

{
 "$eq":{
  "projects[].customer.contacts[].phones[].value":"+1407555444"
  }
}

For more examples about how to use container field paths in query conditions, see OJAI Query Conditions Using Container Field Paths.

Using Container Field Paths to Support Schema Evolution

Although this blog post does not cover this topic, another flexible aspect of container field paths is it treats a single value as an array with one element. This enables you to use a container field path to access a field that has both array elements and scalar values. This means that if a field evolves from a single value to an array of values, you do not have to modify your queries.

Applying Multiple Query Conditions on Container Field Paths

Another important feature of container field paths are query conditions where you want to apply multiple conditions on the same array element using a container field path. The OJAI syntax provides a new $elementAnd operator to support this functionality.

To learn how to use the $elementAndoperator, see OJAI Query Conditions Using elementAnd.

Multiple Language Support in OJAI

You can write your OJAI applications in Java, and starting with MEP 6.0, also Node.js and Python. Additional languages are planned for upcoming releases.

The API in each language is tailored to developers of the respective language. While the Java API supports a more robust set of features, the Node.js and Python APIs are lighter-weight and emphasize ease of installation and use.

MapR Database also supports a REST API. The REST API enables you to use HTTP operations to perform basic operations on JSON tables, including querying.

The following links provide more information about the APIs described and sample programs for each language:

Summary

MapR Database is a scalable document database with native JSON support. With JSON support, you can represent complex, unstructured data without having to define a fixed schema. As your business requirements change and your schemas evolve, JSON handles this without your needing to modify your applications.

MapR Database provides a rich programmatic interface through the OJAI interface. It enables you to write expressive queries that efficiently access your complex JSON documents. It also enables you to run high-performing, complex analytics on your large-scale data. The interfaces are available in multiple languages, to align with your application ecosystem requirements.

With the power and flexibility of JSON and the functionality of the OJAI interface, you are no longer restricted by the limitations of a traditional relational database.

Subsequent blog posts will further explore other technical aspects of MapR Database.


This blog post was published December 10, 2018.
Categories

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.


Get our latest posts in your inbox

Subscribe Now