MapR Database

Distributed, Scalable, NoSQL Database for Global Data-Intensive Applications

FREE TRAINING

MapR Database Essentials
On-Demand Training

What Is MapR Database?

MapR Database is a high-performance NoSQL (“Not Only SQL”) database management system built into the MapR Data Platform. It is a highly scalable multi-model database that brings together operations and analytics as well as real-time streaming and database workloads to enable a broader set of next-generation data-intensive applications in organizations.

MapR Database diagram

MapR Database is enterprise grade and capable of handling extreme scalability to provide real-time operations and analytics without the extra complexity and cost of a traditional database architecture. Additional benefits include:

  • Multi-model flexibility – MapR Database provides key-value, wide column, and JSON document models, enabling more complex applications that need multiple data models. It can easily handle existing MapR Database applications without the need for Java virtual machines and manages a slew of operational data formats allowing for a faster, more flexible development cycle.
  • Real-time, event-driven data – MapR Database harnesses a platform approach to NoSQL capable of real-time analytics in addition to other database workloads at enterprise-level size and speed. Platform and filesystem optimizations allow for extreme scalability on thousands of nodes per cluster and trillions of records per table.
  • Mission-critical reliability and performance – MapR Database minimizes downtime and eliminates points of failure using a strong consistency model and replication. It performed consistently during throughput benchmark tests, and its unique data structure innovations ensure consistently low latency during various testing scenarios. MapR Database’s optimization is done automatically without the need for additional code.

WHY MAPR DATABASE?

Challenges icon
  • Challenges with Previous Technologies

  • RDBMS diagram
  • Challenges with Relational Databases
  • Relational databases were the standard for years, so what changed? With more and more data came the need to scale. Relational databases are ideal for handling structured, predictably sized data sets on a single node; they were not designed to be run on clusters.
  • With a relational database, you normalize your schema, which eliminates redundant data and makes storage efficient, then you use indexes and queries with joins to bring the data back together again. Indexes slow down data ingestion with lots of non sequential disk I/O and joins cause bottlenecks on reads with lots of data. The relational model does not scale horizontally across a cluster. Relational databases are not designed to cost-effectively handle the performance and scale required by big data.
  • Challenges with Other NoSQL Databases
  • While other NoSQL databases may offer advantages in performance and cost savings, many are still evolving as reliable database options. Some solutions require increased administrative attention and maintenance, resulting in longer downtimes and network bottlenecks. In addition, organizations also face data inconsistency and data loss during replication as well as a lack of granular access controls when using NoSQL.
  • Other NoSQL databases have the following disadvantages:
    • HBase compactions cause a lot of disk I/O and inconsistent, sometimes low throughput and high latency.
    • HBase node crash recovery is slow, meaning that it is not highly available for a partition that is down.
    • Cassandra’s architecture is supposed to deliver high availability at the cost of consistency, but according to Robert Yokota from Yammer, Cassandra is not more reliable than their strongly consistent systems, yet Cassandra is more difficult to work with and reason about in the presence of inconsistencies.
    • With Cassandra, downed nodes are a common cause of data inconsistency and need to be routinely fixed by manually running an anti-entropy repair tool.

Advantages icon
  • Advantages of MapR Database

  • MapR Database diagram
  • MapR Database Is Designed to Scale
  • Organizations are turning to solutions that are more capable of scaling appropriately to handle big data and the underlying workloads, such as data processing and analytics. MapR Database was designed for cost-effective scalability with a distributed cluster architecture, which traditional databases can’t deliver.
  • With MapR Database, a table is automatically partitioned across a cluster by key range, and each server is the source for a subset of a table, providing for scalable and fast reads and writes by row key. Partitioning happens in real time with no manual intervention. This dramatically improves performance by distributing the database processing across the cluster. Data accessed together is stored together in a denormalized schema, for example in JSON documents, which makes it faster for scaling.
  • MapR Database Is Designed for Reliability and Performance
  • To understand how MapR Database does what others can’t, you need to understand how the MapR Data Platform is unique. MapR Database tables, MapR XD files, and MapR Event Store are integrated into the MapR XD high-scale, reliable, globally distributed data store, which provides replication with 24/7 reliability and zero data loss. MapR XD implements a random read-write file system and accesses disks directly so that MapR Database can do efficient file updates instead of always writing to new immutable files like HBase and Cassandra do.
  • Compared with other NoSQL databases, the MapR Database has several advantages:
  • For more information read An In-Depth Look at How MapR Database Does What Cassandra, HBase, and Others Can't.

KEY BENEFITS OF MAPR DATABASE

MULTI-MODEL FLEXIBILITY icon

MULTI-MODEL FLEXIBILITY

MapR Database supports multiple data models including document, wide-column, key-value, and time series on a unified foundation.

NATIVE JSON icon

NATIVE JSON SIMPLICITY WITH EXPRESSIVE QUERIES

MapR Database is a highly scalable document database with native JSON support. It provides intuitive and expressive OJAI query language to build powerful applications.

EXTREME PERFORMANCE icon

EXTREME PERFORMANCE AND EFFORTLESS HORIZONTAL SCALE

In recent benchmarks validated by ESG, MapR Database was 2.5x faster than Cassandra and 5.5x faster than HBase on average across all workloads.

CONSISTENCY icon

STRONG CONSISTENCY – NO DATA LOSS

MapR Database has strong consistency by default and always. MapR Database has in-sync replication (factor 3) always on. Once data is acknowledged, it will never be lost or corrupted.

AVAILABILITY icon

EXTREME HIGH AVAILABILITY

MapR Database inherits the enterprise features of the underlying platform with respect to failure handling, recovery, and resiliency.

REPLICATION icon

GLOBAL MULTI-MASTER REPLICATION

The MapR Data Platform provides volume- and topology-based placement controls to enable multiple MapR Database applications to run securely and independently in the same cluster.

MULTI-TENANCY icon

OPTIMIZED MULTI-TENANCY FOR THOUSANDS OF APPS

The MapR Data Platform provides volume- and topology-based data placement controls to support multi-tenancy, which means multiple MapR Database applications can run securely and independently in the same cluster without impacting SLAs. This results in lower administrative and hardware costs.

ANALYTICS/ML icon

IN-PLACE SQL AND ADVANCED ANALYTICS/ML

MapR Database is natively integrated with machine learning and analytical tools to enable advanced analytics, data exploration, and interactive SQL, letting you immediately analyze or process live data and apply machine learning.

SECURITY icon

ROBUST SECURITY AND FINE-GRAINED ACCESS CONTROL

MapR Database allows security policies on the sub-document and the element level. You can set strict policies only for the confidential elements instead of whole documents.

MapR Database Demonstration
A Database for Next-Generation Web Applications

This demo shows how MapR Database can be used as a back-end to web applications requiring features such as multi-tenancy across clouds and full text search. It demonstrates how MapR Database's Change Data Capture (CDC) stream can be used to couple other data services, such as Elasticsearch, into web back-ends for multi-cloud deployments.

KEY USE CASES FOR MAPR DATABASE

APPLICATIONS icon

CORE BUSINESS APPLICATIONS

These applications are fundamental to running a business. For example, fraud prevention is a key process for payment processing. Other core business applications include inventory management, risk analysis, churn detection, and biometric verifications. Organizations want to build next-gen core applications that support real-time business processes that are optimized using analytics/ML.

IoT icon

IoT

IoT refers to applications that primarily operate on data created by sensors, devices, and machines. The use cases include predictive maintenance, real-time operations dashboards, alerting, real-time tuning of devices, and quality assurance.

DATA HUB icon

OPERATIONAL DATA HUB

MapR Database brings all data together in a data hub in real time to provide immediate business insights. MapR Database is required for frequently changing data from applications and transactional systems. In data lakes, relational data would typically be stored in Parquet or Avro files and accessed through Apache Hive. Because Parquet is a write-once file format, the file cannot be updated once written. MapR Database, however, can store and update frequently changing data.

METADATA MANAGEMENT icon

CATALOG LIST/METADATA MANAGEMENT

This use case consists of catalogs or other metadata used to manage key business entities from enterprises and online services. These entities could be SKUs, inventory parts, sensors, stock trades, playlists, or measurement results.

USER EXPERIENCE icon

CONTEXTUAL USER EXPERIENCE

This use case involves customizing user experience based on user activity. Common examples include personalized recommendations on video-on-demand services and e-commerce.

SINGLE VIEW icon

SINGLE VIEW

Businesses typically use multiple enterprise applications, which means data related to a single business entity can often lie in multiple data silos. “Single view” means providing one place to find all information about a business entity. The most common example of this use case is Customer 360.

WHY MAPR DATABASE MATTERS TO YOU

DEVELOPERS/ DATA ENGINEERS icon

DEVELOPERS/ DATA ENGINEERS

  • MapR Database supports multiple data models including wide-column, document, key value, and time-series on a unified foundation.
  • MapR Database is natively integrated with machine learning and analytical processing to enable advanced analytics, data exploration, and interactive SQL.
  • In recent benchmarks validated by ESG, MapR Database was observed to be 2.5x faster than Cassandra and 5.5x faster than HBase on average across all workloads.
  • MapR Database is integrated with MapR Event Store for Apache Kafka out of the box for real-time data flows. MapR Event Store is a global event streaming system that enables real-time data ingestion and stream processing.
  • The MapR Database Connector for Apache Spark provides easier, faster data pipelines:
    • Build complex ETL pipelines that can speed up data ingestion and deliver superior performance.
    • Combine event streams with machine learning to handle the logistics of machine learning.
  • Persist data for containerized applications. MapR Data Fabric for Kubernetes allows for MapR volumes to be mounted for access by containers.
  • Scale data as containers grow. With a “grow as you go” feature, MapR handles growth in data without having to move it to a separate, dedicated environment.
IT / STORAGE ADMINISTRATOR icon

IT / STORAGE ADMINISTRATOR

  • The MapR Data Platform is built for production. Consistent snapshots, replicas, and mirroring deliver enterprise-grade high availability and disaster recovery.
  • MapR XD is multi-tenant by design. Assign policies (quotas, permissions, placement) to logical units of management called volumes.
  • Balance cost and performance with MapR XD. Leverage policy-based data tiering, erasure coding, data placement, and more.
  • MapR Database has strong consistency by default and always. MapR Database has in-sync replication (factor 3) always on. Once data is acknowledged, it will never be lost or corrupted.
  • MapR Database inherits the enterprise features of the underlying platform with production-ready failure handling, recovery, and resiliency.

Why Is MapR Database with the MapR Data Platform Better?

A confluence of several different technology shifts have dramatically changed big data and machine learning applications. The combination of distributed computing, streaming analytics, and machine learning is accelerating the development of next-generation intelligent applications, which take advantage of modern computational paradigms powered by modern computational infrastructure. The MapR Data Platform integrates global event streaming, real-time database capabilities, and scalable enterprise storage with Hadoop, HBase, Spark, Drill, and machine learning libraries to power this new generation of data processing pipelines and intelligent applications. Diverse and open APIs allow all types of analytics workflows to run on the data in place:

  • The MapR XD Distributed File and Object Store is designed to store data at exabyte scale, support trillions of files, and combine analytics and operations in a single platform. MapR XD supports industry-standard protocols and APIs, including POSIX, NFS, S3, and HDFS. Unlike Apache HDFS, which is write once, append only, the MapR Data Platform delivers a true read-write, POSIX-compliant file system. Support for the HDFS API enables Spark and Hadoop ecosystem tools for both batch and streaming to interact with MapR XD. Support for POSIX enables Spark and all non-Hadoop libraries to read and write to the distributed data store as if the data were mounted locally, which greatly expands the possible use cases for next-generation applications. Support for an S3-compatible API means MapR XD can also serve as the foundation for Spark applications that leverage object storage.
  • The MapR Event Store for Apache Kafka is the first big-data-scale streaming system built into a unified data platform and the only big data streaming system to support global event replication reliably at IoT scale. Support for the Kafka API enables Spark streaming applications to interact with data in real time in a unified data platform, which minimizes maintenance and data copying.
  • MapR Database is a high-performance NoSQL database built into the MapR Data Platform. MapR Database is multi-model: wide-column, key-value with the HBase API or JSON (document) with the OJAI API. MapR Database can also be queried with Apache Spark or Apache Drill SQL:
    • The MapR Database Connector for Apache Spark enables users to do the following:
      • Use MapR Database as a sink for Spark Streaming
      • Perform complex SQL queries and updates on top of MapR Database while applying critical techniques such as projection and filter pushdown, custom partitioning, and data locality
    • Apache Drill is an open source distributed SQL query engine that delivers fast and secure self-service BI SQL analytics at scale. Drill supports a variety of SQL and NoSQL databasesand file systems, including MapR Database, MapR XD Distributed File and Object Store, HDFS, HBase, MongoDB, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS, and local files. A single query can join data from multiple datastores.

MapR put key technologies essential to achieving high scale and high reliability in a fully distributed architecture that spans on-premises, cloud, and multi-cloud deployments, including edge-first IoT, while dramatically lowering both the hardware and operational costs of your most important applications and data.

CUSTOMERS USING MAPR DATABASE

comScore logo
Sanchez logo