Architecture

MapR Database is an enterprise-grade, high performance, NoSQL (“Not Only SQL”) database management system. You can use it to add realtime, operational analytics capabilities to big data applications. As a multi-model NoSQL database, it supports both JSON document models and wide column data models.

Why use MapR Database?

  • Integrated analytics with SQL: MapR Database's integration with Drill for MapR provides a low latency, distributed, SQL-like query engine for large-scale datasets, including structured and semi-structured, nested data.
  • Operational analytics: MapR Database can run in the same cluster as Apache™ Hadoop® and Apache Spark, letting you immediately analyze or process live, interactive data. This also enables you to eliminate data silos to speed the data-to-action cycle, providing a more efficient data architecture.
  • Global distribution of applications: Application access to MapR Database tables is distributable on a global scale.
  • Flexible data model: You can use MapR Database as both a document database and a wide-column database. As a document database, MapR Database stores JSON documents in JSON tables. As a wide-column database, it stores binary files in binary tables.

How is MapR Database Related to MapR Filesystem?

MapR Database implements tables within the framework of the MapR file system MapR Database creates tables (both binary and JSON tables) in logical units called volumes.



What are MapR Database's Architectural Advantages?

MapR Database's architecture has the following advantages:

  • It reduces process overhead because it has no extra layers to pass through when performing operations on data.

    MapR Database, like several other NoSQL databases, is a log-based database. MapR Database runs inside of the MapR file system process, which enables it to read from and write to disks directly. In contrast, other NoSQL databases must communicate with a separate process to performs disk reads and writes. The approach taken by MapR Database eliminates extra process hops, duplicate caching, and needless abstractions, with the consequence of optimizing I/O operations on your data.

  • It minimizes compaction delays because it avoids I/O storms when it merges logged operations with structures on disk.

    As a log-based database, MapR Database must write logged operations to disk. MapR Database stores table regions (also called tablets) and smaller structures within them partially as b-trees. Together with write-ahead logs (WAL), these b-trees comprise log-structured-merge trees. Write-ahead logs for the smaller structures within regions are periodically restructured by rolling merge operations on the b-trees. Because MapR Database performs these merges at small scales, applications running against MapR Database see no significant effects on latency while the merges are taking place.
    Note: Apache HBase also uses the term regions. MapR has deprecated support for Apache HBase.

What Design Factors are Important when Using MapR Database?

  • Rowkey Optimization: The design of a table's rowkeys affects the speed at which client applications can access data. It also impacts database performance if hotspotting occurs. The better the design, the faster the data access. See Table Rowkey Design for more information.
  • Column Family Optimization: Column families enable you to group related sets of data and restrict queries to a defined subset, leading to better performance. When you design a column family, think about what kinds of queries you are going to use most often, and group your columns accordingly. See Column Families in JSON Tables and Column Families in Binary Tables for more information.
  • Replication Implementation: The design of table replication (in addition to the automatic replication that occurs with table regions within a volume) depends on your desired outcome and the complexity of your environment. See Table Replication for more information.
  • Security Implementation: You can implement security at many different levels including for table replication, JSON documents, and general access. Determining what level and where is part of the architectural design. See Security on JSON Tables, Security and Replication, and Security.