MapR-DB 6.0 - The Modern Database for Global Data-Intensive Applications

Contributed by Neeraja Rentachintala

Today, we are delighted to announce the next level of advancement of the MapR platform with the latest release of MapR Database, MapR-DB 6.0 - The modern database for global data-intensive applications.

At MapR, we are fortunate to work alongside leading global 2000 companies on their journey to digitally transform themselves. Digital transformation starts with providing rich contextual user experiences to attract, engage, and retain their key stakeholders. Our customers are making big bets on modernizing the core business processes, uncovering real-time insights, and enabling automated decision making to cut down costs and innovate faster.

Enabling the business transformation requires these organizations to deal with extreme scale of data and applications. These data-intensive applications need the ability to capture and interact with diverse types of data such as IOT and leverage them to maximize their business outcomes. Customers are trying to move after-the-fact insights and processes to real-time processes and in-the-moment proactive actions by infusing ML/AI driven data intelligence. Data is the foundational enabler in all these “data-intensive” applications.

As organizations aspire to build these applications, they need critical technology building blocks and databases are the foundational components of application architecture in operationalizing the data . While working with these customers, we have identified 3 critical areas of concern around databases in the past few years.

  1. Databases are not islands any more - In order to enable the next-generation data intensive applications, database must be part of a broader data platform including analytic processing for in-place intelligence and streaming for real-time data flows to establish a continuum of data, insights and operations coming together in real-time.

  2. Database selection at organizations today is often a decision of complex trade offs of critical requirements - Is write ingest speed critical or read? Is data consistency critical or performance? Is this data really important for me or it’s ok to lose it because app availability is lot more important? Can I live with coarse grained security? Do I need wide column performance or JSON flexibility? The result is that while there are many many new databases in the market, they often are used in niche purpose-built use cases or as part of auxiliary applications while most of the mission critical business apps continue to stay in well trusted traditional RDBMS systems not leveraging the modern technology trends thus not fully achieving the desired business transformation.

  3. Every database app today is a silo in its own infrastructure - the result is 100s-1000s of non-integrated apps - This is a daunting reality. Databases traditionally are not built for running multiple apps simultaneously and still meet SLAs. The result is complexity and cost of infrastructure and the need to interconnect these apps using fragile data pipelines.

At MapR, our goal has been to a build a complete data platform with a built-in modern scalable database to create these breakthrough data-intensive applications spread across on-prem, edge and multi-cloud environments with no complex trade offs and compromises.

MapR-DB allows a broad variety of applications by bringing critical database capabilities into one system as below.

MapR-DB Diagram

MapR-DB Continuous Innovation & What’s new in 6.0

Over the last 3 years, MapR has systematically built MapR-DB to be a converged and complete database. The latest MapR-DB 6.0 release delivers on this broader vision.

Here is the evolution of the database over the past few years leading to the MapR-DB 6.0.

MapR-DB Milestones

MapR-DB 6.0 is a significant milestone. With this, we are introducing several new capabilities & performance improvements to expand the usage of the database in organizations.

Here is the summary of the key features in this release.

Powerful and efficient data access w/native secondary indexes

Prior to 6.0, MapR-DB is optimized for access only based on rowkey. The new built-in rich secondary indexes expand on this by supporting flexible and efficient queries on any columns in the DB tables at Scale. This enables application developers to build rich and new types of applications that supports complex user interaction patterns and business users can perform optimized/high performance SQL queries using the familiar BI/Analytics tools.

The key features of the Secondary Indexing functionality include:

  • Native Secondary indexes for MapR-DB JSON tables - no external indexing system such as Elasticsearch or Solr necessary
  • Scalable & Enterprise grade indexing with auto-propagation, auto-scale & auto-management
  • Extreme index scalability & performance with SSD optimizations
  • Rich indexing functionality - unlimited indexes, composite indexes with large # of columns, Comprehensive data types support, hashed indexes, covering/non-covering query support, security, and more
  • Highly functional and seamless queries across primary, secondary index tables
  • Optimized index based access for application development & BI/Analytics

Rich and expanded application development with MapR-DB OJAI 2.0 APIs

OJAI (Open JSON Application Interface) is the API to develop applications with MapR-DB document data model. In 6.0, we are expanding on the API for more functionality and performance.

The new capabilities include:

  • New & intuitive OJAI query interface
  • JSON grammar & Fluent API semantics
  • Rich expressive language support including conditional filtering, sorting & pagination support
  • Efficient queries w/seamless index based access
  • Smart query execution to support operational and operational analytic applications on any data scale and with any query complexity

Optimized Drill/DB integration for in-place SQL Data Exploration & Operational BI

Apache Drill provides flexible SQL analytics on the data in MapR-DB JSON tables. Drill is a distributed SQL query engine and serves as a unified interactive access layer for the MapR platform bringing together data from MapR-FS and MapR-DB.

The new capabilities of the MapR-DB & Drill optimize the SQL data access on MapR-DB speeding up ad-hoc queries.The new capabilities include:

  • Ability for variety of Drill SQL queries to seamlessly leverage MapR-DB secondary indexes to significantly speed up query performance & avoid large scans
  • Statistics, selectivity, and cost-based Index selection
  • Index support for Filter/Sort/Offset/Limit operators
  • Comprehensive index functionality support including single, composite, covering/non-covering indexes, and index intersection

In-place Advanced analytics/ML on MapR-DB JSON with Native Spark connectivity

MapR-DB 6.0 deeply integrates with Apache Spark and MapR-DB JSON tables. Customers can use these capabilities to perform real-time data processing as well as build and serve machine learning models on MapR-DB tables directly without creating analytic silos.

The new capabilities of this integration include:

  • Batch & real time data processing support with Native Spark connectivity
  • Supports for all key Spark constructs - RDDs, Dataframes/Datasets
  • Optimized Spark performance with projection and filter pushdown

In-place ETL/Data Processing on MapR-DB JSON with Native Hive support

MapR-DB 6.0 deeply integrates with Apache Hive and MapR-DB JSON tables. Customers can use these capabilities to perform ETL/batch processing of the data in MapR-DB tables directly.

The new capabilities of this integration include:

  • New Hive storage handler for MapR-DB JSON tables
  • Support for extensive Hive SQL functionality & Data types on MapR-DB tables

Real-time data integration & Micro-services w/MapR-DB Change Data Capture API

Built on the foundations of global table replication and MapR Event Streaming, the MapR-DB Change Data Capture API provides a powerful and easy to use interface to support real time integration of changes arriving at a MapR-DB table to arbitrary, external systems. Users can now build applications to consume and process the MapR-DB table data changes published as ‘change log’ streams in real time in a highly scalable way. The change data propagation is granular for selected columns/fields and supports ordered atleast-once delivery.

This capability enables use cases such as:

  • Track changes happening to the MapR-DB (Inserts, Updates, Deletes) and perform real time processing on the data
  • Synchronize data in MapR-DB with a downstream search index (Such as Elastic search, Solr), materialized views or in-memory caches

All the new functionality expands on the data access capabilities on MapR-DB and help leverage in a variety of use cases such as customer 360, personalization, real-time analytics, IOT, and building scalable and high performance enterprise apps. The general availability of the MapR-DB 6.0 is in Q4’2017.

For more information on MapR-DB refer to the following:


This blog post was published September 26, 2017.
Categories

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.