MapR Event Store for Apache Kafka

The first massively scalable publish-subscribe event streaming system built into a unified data platform

FREE TRAINING

MapR Event Store Essentials
Free On-Demand Training

What Is MapR Event Store for Apache Kafka?

MapR Event Store for Apache Kafka is the first massively scalable publish-subscribe event streaming system built into a unified data platform. It is the only publish-subscribe streaming system to support global event replication reliably at IoT scale. MapR Event Store for Apache Kafka supports the Kafka API and includes out-of-the-box integration with popular streaming frameworks such as Spark Streaming and Kafka Streams. The MapR Event Store for Apache Kafka is integrated with MapR XD Distributed File and Object Store as well as MapR Database, resulting in the most comprehensive dataware for businesses to run nearly any workload on a single cluster in production.

MapR Event Store diagram

WHY MAPR EVENT STORE?

Challenges icon
  • Challenges with Previous Technologies


  • Delayed Processing and Insights

  • Modern businesses are being overwhelmed by the onslaught of data continuously created by diverse sources such as web applications, social media, sensors, connected devices, and machine logs. Although the data is created constantly, it has typically been consumed for transformation, movement, or processing by scheduled batch processes. This introduces data pipeline complexity and precludes the ability to respond immediately to new information.

  • Challenges with Data Integration

  • Integrating data sources and applications can quickly become unorganized, complicated, and tightly coupled.

  • Challenges with Traditional Message Queues

  • Traditional message queues can not handle the volume of data for the Internet of Things, with millions of sources, hundreds of destinations, and the demand for real-time analytics.

  • Challenges with Geographic Dispersion

  • Diverse data sources are often geographically distributed, sending data to the closest data center for low latency. This distributed data needs to be centralized and joined with data from enterprise applications to paint a complete picture of the state of business.

  • Architectural Complexity

  • Businesses typically deploy data transport systems and data processing systems in separate clusters. This creates complexity in analyzing new data available in the data streams in real time, as well as administrative overhead of managing separate clusters.

  • Challenges with Kafka

  • Companies have several challenges operating Kafka at scale including:
    • Partition load balancing and hotspotting problems.
    • Partition storage is limited to the storage capacity of a node since Kakfa requires partitions to fit within the disk space of a single cluster node and cannot be split across machines.
    • Kafka’s mirroring design simply forwards messages to a mirror cluster and has several limitations, including expensive rebalancing, difficulty adding topics, possible data loss, and lack of metadata synchronization, which means consumers and producers cannot automatically failover from one cluster to a mirror.

Advantages icon
  • MapR Event Store Overcomes Challenges with Previous Technologies


  • Continuous Real-Time Data Processing

  • MapR Event Store makes real-time data directly available for processing. Real-time data can be processed by stream processing frameworks such as Spark Streaming, Kafka Streams, or KSQL to enable sub-second response and automated actions.

  • Decoupled Publish/Subscribe API

  • Topics are logical collections of events that organize events into categories and decouple producers from consumers. Decoupled communications with the publish/subscribe Kafka API makes it easy to add new listeners or new publishers without disrupting existing processes.

  • Designed to Scale

  • With MapR Event Store, topics are partitioned for throughput and scalability. Partitions are dynamically balanced according to load, automatically spreading data and processing across all nodes in the cluster. MapR Event Store can scale to very high throughput levels, easily delivering millions of messages per second on modest hardware. Producers and consumers will automatically load balance across partitions, enabling applications to scale linearly with increasing data rates.

  • Global Availability and Scalability

  • MapR Event Store is designed for geographically dispersed systems, with real-time global replication. Streams can be replicated in a master-slave, many-to-one, or multi-master configuration between thousands of geographically distributed clusters. Data created at multiple geographical locations can be processed in real time to get a complete state-of-the-business picture.
  • Producers and consumers can failover between distributed clusters for high availability.

  • Unified Data Platform

  • MapR Event Store brings together data transport and data processing in the same cluster. Batch, interactive, and stream processing frameworks have direct access to event streams, eliminating data movement and ensuring consistency.

  • Designed for Scalability, High Availability, Reliability, and Performance

  • MapR Event Store is differentiated by its proven enterprise features such as global replication, security and multi-tenancy, and high availability/disaster recovery (HA/DR), all of which it inherits from the MapR Data Platform. Not only does MapR Event Store address major operational deficiencies in Kafka, it also executes with higher performance. MapR Event Store transports a much faster stream of data, with much larger message sizes, and to more topics than what could be achieved with Kafka on a similarly sized cluster.

KEY BENEFITS OF MAPR EVENT STORE

UNIFIED PLATFORM icon

UNIFIED PLATFORM

The MapR Data Platform supports storage and processing of files, database tables, and event streams.

INFINITE PERSISTENCE icon

INFINITE PERSISTENCE

With MapR Event Store, as events are persisted, partitions are dynamically balanced across the cluster, allowing events to be persisted indefinitely.

UNLIMITED SCALE icon

UNLIMITED SCALE

Capacity and performance scale linearly as servers are added within a cluster, with each server handling more than 1 million messages per second.

HIGHLY RELIABLE icon

HIGHLY RELIABLE

The proven high availability and disaster recovery capabilities of the MapR Platform are inherited by MapR Event Store.

To ensure the highest levels of uptime, intra-cluster replication guarantees no single points of failure and safeguards against multiple failures in a cluster.

Events are reliably delivered from producers to consumers with no data loss.

INTEGRATED SECURITY icon

INTEGRATED SECURITY

The MapR Data Platform provides fine-grained authorization with ACEs at the stream level. Users are authenticated with Kerberos, Linux PAM, and/or LDAP.

The MapR Data Platform offers wire-level encryption for all data to and from producers and consumers as well as data replicated between globally distributed clusters – all under a unified security model between MapR services.

GLOBAL REPLICATION icon

GLOBAL REPLICATION

The MapR Data Platform provides reliable stream replication between an arbitrary topology, supporting thousands of clusters across the globe.

Topologies of connected clusters include one-to-one, one-to-many, many-to-one, many-to-many, star, ring, and mesh. Loops are automatically handled to avoid data duplication.

Stream metadata is replicated alongside data, allowing producers and consumers to failover between sites for high availability and to ensure business continuity should an entire site-wide disaster occur.

Demo Video: MapR Event Store (formerly MapR Streams) in Action

In this video we show how MapR Event Store for Apache Kafka enables global high availability and failover in a multi-datacenter environment, including handling failover of both consumers and producers of data.

KEY FUNCTIONAL USE CASES FOR MAPR EVENT STORE

STREAM PROCESSING icon

STREAM PROCESSING

MapR Event Store provides the ingest, transport, and buffering layer for stream processing frameworks like Spark Streaming or Kafka Streams to enable real-time operations such as calculations and aggregations on data as it’s delivered.

DATABASE CHANGE CAPTURE icon

DATABASE CHANGE CAPTURE

Change capture keeps the operational system of record synchronized with other systems.

APPLICATION LOGS AND METRICS DELIVERY icon

APPLICATION LOGS AND METRICS DELIVERY

MapR Event Store can provide a pipeline for log/metrics data coming out of appliances, servers, and applications, making them available to infrastructure monitoring systems for alerting, dashboarding, and search.

KEY VERTICAL USE CASES FOR MAPR EVENT STORE

AD TECH icon

AD TECH

  • Real-time user targeting based on segment and preferences.
OIL AND GAS icon

OIL AND GAS

  • Real-time monitoring of pumps/rigs.
FINANCIAL SERVICES icon

FINANCIAL SERVICES

  • Real-time fraud detection.
  • Real-time mobile notifications.
RETAIL icon

RETAIL

  • Build an intelligent supply chain by placing sensors or RFID tags on items to alert if items aren’t in the right place or proactively order more if supply is low.
  • Smart logistics with real-time end-to-end tracking of delivery trucks.
TELECOMMUNICATIONS icon

TELECOMMUNICATIONS

  • Real-time antenna optimization based on user location data.
  • Real-time charging and billing based on customer usage, ability to populate up-to-date usage dashboards for users.
  • Mobile offers.
  • Optimized advertising for video/audio content based on what users are consuming.
HEALTHCARE icon

HEALTHCARE

  • Smart hospitals – collect data and readings from hospital devices (vitals, IVs, MRI, etc.) and analyze and alert in real time.
  • Biometrics – collect and analyze data from patient devices that monitor vitals while outside of care facilities.

WHY MAPR EVENT STORE MATTERS TO YOU

CIO / ENTERPRISE ARCHITECT icon

CIO / ENTERPRISE ARCHITECT

IT / STORAGE ADMINISTRATOR icon

IT / STORAGE ADMINISTRATOR

DEVELOPERS / DATA ENGINEERS icon

DEVELOPERS / DATA ENGINEERS

DATA SCIENTISTS icon

DATA SCIENTISTS

  • Easier, faster time to insight:
  • Bring machine learning models close to IoT data sources for anomaly detection, fraud detection, and sensor monitoring.
  • Combine event streams with machine learning to handle the logistics of machine learning in a flexible way by:
    • Making input and output data available to independent consumers
    • Managing and evaluating multiple models and easily deploying new models

Why Is MapR Event Store for Apache Kafka with the MapR Data Platform Better?

A confluence of several different technology shifts have dramatically changed big data and machine learning applications. The combination of distributed computing, streaming analytics, and machine learning is accelerating the development of next-generation intelligent applications, which take advantage of modern computational paradigms powered by modern computational infrastructure. The MapR Data Platform integrates global event streaming, real-time database capabilities, and scalable enterprise storage with Hadoop, HBase, Spark, Drill, and machine learning libraries to power this new generation of data processing pipelines and intelligent applications. Diverse and open APIs allow all types of analytics workflows to run on the data in place:

  • The MapR XD Distributed File and Object Store is designed to store data at exabyte scale, support trillions of files, and combine analytics and operations in a single platform. MapR XD supports industry-standard protocols and APIs, including POSIX, NFS, S3, and HDFS. Unlike Apache HDFS, which is write once, append only, the MapR Data Platform delivers a true read-write, POSIX-compliant file system. Support for the HDFS API enables Spark and Hadoop ecosystem tools for both batch and streaming to interact with MapR XD. Support for POSIX enables Spark and all non-Hadoop libraries to read and write to the distributed data store as if the data were mounted locally, which greatly expands the possible use cases for next-generation applications. Support for an S3-compatible API means MapR XD can also serve as the foundation for Spark applications that leverage object storage.
  • The MapR Event Store for Apache Kafka is the first big-data-scale streaming system built into a unified data platform and the only big data streaming system to support global event replication reliably at IoT scale. Support for the Kafka API enables Spark streaming applications to interact with data in real time in a unified data platform, which minimizes maintenance and data copying.
  • MapR Database is a high-performance NoSQL database built into the MapR Data Platform. MapR Database is multi-model: wide-column, key-value with the HBase API or JSON (document) with the OJAI API. MapR Database can also be queried with Apache Spark or Apache Drill SQL:
    • The MapR Database Connector for Apache Spark enables users to do the following:
      • Use MapR Database as a sink for Spark Streaming
      • Perform complex SQL queries and updates on top of MapR Database while applying critical techniques such as projection and filter pushdown, custom partitioning, and data locality
    • Apache Drill is an open source distributed SQL query engine that delivers fast and secure self-service BI SQL analytics at scale. Drill supports a variety of SQL and NoSQL databases and file systems, including MapR Database, MapR XD Distributed File and Object Store, HDFS, HBase, MongoDB, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS, and local files. A single query can join data from multiple datastores.

MapR put key technologies essential to achieving high scale and high reliability in a fully distributed architecture that spans on-premises, cloud, and multi-cloud deployments, including edge-first IoT, while dramatically lowering both the hardware and operational costs of your most important applications and data.

CUSTOMERS USING MAPR EVENT STORE

Quantium logo
NTT Security logo

What's New?