MapR Database 6.0 - The Modern Database for Global Data-Intensive Applications

Contributed by

9 min read

Today, we are delighted to announce the next level of advancement of the MapR platform with the latest release of MapR Database - The modern database for global data-intensive applications.

At MapR, we are fortunate to work alongside leading global 2000 companies on their journey to digitally transform themselves. Digital transformation starts with providing rich contextual user experiences to attract, engage, and retain their key stakeholders. Our customers are making big bets on modernizing the core business processes, uncovering real-time insights, and enabling automated decision making to cut down costs and innovate faster.

Enabling the business transformation requires these organizations to deal with extreme scale of data and applications. These data-intensive applications need the ability to capture and interact with diverse types of data such as IOT and leverage them to maximize their business outcomes. Customers are trying to move after-the-fact insights and processes to real-time processes and in-the-moment proactive actions by infusing ML/AI driven data intelligence. Data is the foundational enabler in all these “data-intensive” applications.

As organizations aspire to build these applications, they need critical technology building blocks and databases are the foundational components of application architecture in operationalizing the data . While working with these customers, we have identified 3 critical areas of concern around databases in the past few years.

  1. Databases are not islands any more - In order to enable the next-generation data intensive applications, database must be part of a broader data platform including analytic processing for in-place intelligence and streaming for real-time data flows to establish a continuum of data, insights and operations coming together in real-time.

  2. Database selection at organizations today is often a decision of complex trade offs of critical requirements - Is write ingest speed critical or read? Is data consistency critical or performance? Is this data really important for me or it’s ok to lose it because app availability is lot more important? Can I live with coarse grained security? Do I need wide column performance or JSON flexibility? The result is that while there are many many new databases in the market, they often are used in niche purpose-built use cases or as part of auxiliary applications while most of the mission critical business apps continue to stay in well trusted traditional RDBMS systems not leveraging the modern technology trends thus not fully achieving the desired business transformation.

  3. Every database app today is a silo in its own infrastructure - the result is 100s-1000s of non-integrated apps - This is a daunting reality. Databases traditionally are not built for running multiple apps simultaneously and still meet SLAs. The result is complexity and cost of infrastructure and the need to interconnect these apps using fragile data pipelines.

At MapR, our goal has been to a build a complete data platform with a built-in modern scalable database to create these breakthrough data-intensive applications spread across on-prem, edge and multi-cloud environments with no complex trade offs and compromises.

MapR Database allows a broad variety of applications by bringing critical database capabilities into one system as below.

MapR Database Diagram

MapR Database Continuous Innovation & What’s new in 6.0

Over the last 3 years, MapR has systematically built MapR Database to be a converged and complete database. The latest MapR Database 6.0 release delivers on this broader vision.

Here is the evolution of the database over the past few years leading to the MapR Database 6.0.

MapR Database Milestones

MapR Database 6.0 is a significant milestone. With this, we are introducing several new capabilities & performance improvements to expand the usage of the database in organizations.

Here is the summary of the key features in this release.

Powerful and efficient data access w/native secondary indexes

Prior to 6.0, MapR Database is optimized for access only based on rowkey. The new built-in rich secondary indexes expand on this by supporting flexible and efficient queries on any columns in the DB tables at Scale. This enables application developers to build rich and new types of applications that supports complex user interaction patterns and business users can perform optimized/high performance SQL queries using the familiar BI/Analytics tools.

The key features of the Secondary Indexing functionality include:

  • Native Secondary indexes for MapR Database JSON tables - no external indexing system such as Elasticsearch or Solr necessary
  • Scalable & Enterprise grade indexing with auto-propagation, auto-scale & auto-management
  • Extreme index scalability & performance with SSD optimizations
  • Rich indexing functionality - unlimited indexes, composite indexes with large # of columns, Comprehensive data types support, hashed indexes, covering/non-covering query support, security, and more
  • Highly functional and seamless queries across primary, secondary index tables
  • Optimized index based access for application development & BI/Analytics

Rich and expanded application development with MapR Database OJAI 2.0 APIs

OJAI (Open JSON Application Interface) is the API to develop applications with MapR Database document data model. In 6.0, we are expanding on the API for more functionality and performance.

The new capabilities include:

  • New & intuitive OJAI query interface
  • JSON grammar & Fluent API semantics
  • Rich expressive language support including conditional filtering, sorting & pagination support
  • Efficient queries w/seamless index based access
  • Smart query execution to support operational and operational analytic applications on any data scale and with any query complexity

Optimized Drill/DB integration for in-place SQL Data Exploration & Operational BI

Apache Drill provides flexible SQL analytics on the data in MapR Database JSON tables. Drill is a distributed SQL query engine and serves as a unified interactive access layer for the MapR platform bringing together data from MapR XD and MapR Database.

The new capabilities of the MapR Database & Drill optimize the SQL data access on MapR Database speeding up ad-hoc queries.The new capabilities include:

  • Ability for variety of Drill SQL queries to seamlessly leverage MapR Database secondary indexes to significantly speed up query performance & avoid large scans
  • Statistics, selectivity, and cost-based Index selection
  • Index support for Filter/Sort/Offset/Limit operators
  • Comprehensive index functionality support including single, composite, covering/non-covering indexes, and index intersection

In-place Advanced analytics/ML on MapR Database JSON with Native Spark connectivity

MapR Database 6.0 deeply integrates with Apache Spark and MapR Database JSON tables. Customers can use these capabilities to perform real-time data processing as well as build and serve machine learning models on MapR Database tables directly without creating analytic silos.

The new capabilities of this integration include:

  • Batch & real time data processing support with Native Spark connectivity
  • Supports for all key Spark constructs - RDDs, Dataframes/Datasets
  • Optimized Spark performance with projection and filter pushdown

In-place ETL/Data Processing on MapR Database JSON with Native Hive support

MapR Database 6.0 deeply integrates with Apache Hive and MapR Database JSON tables. Customers can use these capabilities to perform ETL/batch processing of the data in MapR Database tables directly.

The new capabilities of this integration include:

  • New Hive storage handler for MapR Database JSON tables
  • Support for extensive Hive SQL functionality & Data types on MapR Database tables

Real-time data integration & Micro-services w/MapR Database Change Data Capture API

Built on the foundations of global table replication and MapR Event Store, the MapR Database Change Data Capture API provides a powerful and easy to use interface to support real time integration of changes arriving at a MapR Database table to arbitrary, external systems. Users can now build applications to consume and process the MapR Database table data changes published as ‘change log’ streams in real time in a highly scalable way. The change data propagation is granular for selected columns/fields and supports ordered atleast-once delivery.

This capability enables use cases such as:

  • Track changes happening to the MapR Database (Inserts, Updates, Deletes) and perform real time processing on the data
  • Synchronize data in MapR Database with a downstream search index (Such as Elastic search, Solr), materialized views or in-memory caches

All the new functionality expands on the data access capabilities on MapR Database and help leverage in a variety of use cases such as customer 360, personalization, real-time analytics, IOT, and building scalable and high performance enterprise apps. The general availability of the MapR Database 6.0 is in Q4’2017.

For more information on MapR Database refer to the following:

This blog post was published September 26, 2017.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now