6 min read
Today we released MapR 6.0.1 and MEP 5.0, ushering in new capabilities and enhancements across the product, from core platform to MapR Database, Data Science Refinery, and Drill. The area which saw the most action was MapR Event Store, with several enhancements being made that improve developer productivity, application compatibility, and its ability to be a system of record.
"Stream of record" is a design pattern that we have been seeing take off in enterprises, since MapR-ES (Now called MapR Event Store) debuted two years ago. One of our MapR customers, Liaison Technologies, provides a compelling case study of both the technical and business benefits of this approach. In this blog, I'll walk through how we're making these architectures even more flexible.
Spark Structured Streaming is a new stream processing API that makes creating real-time analytic applications easier than ever on the MapR Data Platform. Compared to the DStreams API in previous versions of Apache Spark, Structured Streaming presents a functional, SQL-like API that hides the gory details about how Spark works under the hood. At the same time, it introduces a vast library of useful analytic functions, like event-time windowing and aggregations as well as a new interface for data sources and sinks. For example, with Spark Structured Streaming, users can write a few lines of code to build a data pipeline that consumes from MapR Event Store, performs transformations, and inserts the results into a MapR Database JSON table.
Stream of record users will notice that they can leverage the same API as traditional batch-oriented Spark (Spark SQL and Datasets) to do real-time processing. With only a few small code changes, an application can either process the full history of data that has moved through a stream or begin continuous processing on new data as it is comes in. This capability also comes in handy for machine learning, as models can be trained using Spark ML on historical message data, then deployed into production using Structured Streaming.
When your stream is your system of record, there are a lot of times when you want to query your historical event data. Some examples include:
Apache Drill 1.13, released in MEP 5.0, now has a storage plugin that talks directly to MapR Event Store through the Kafka API to achieve the above use cases and more.
Given the newness of this feature, we're marking it "preview" for the time being. Take it for a spin and let us know what you think! We don't recommend using it for production applications.
We've made several enhancements to the core MapR Event Store API to make applications and analytics more robust and to make new use cases possible. Those improvements include:
The above represent the most significant new API changes to MapR Event Store as part of its adoption of the Kafka 1.0 API. There are a couple of Kafka 1.0 features not yet supported on MapR Event Store, so when in doubt, check our documentation to see what's supported.
When you put all of these new capabilities together with a stream of record architecture, you can easily imagine a user that sits down with a cup of coffee and does the following:
This release represents just a small taste of what we have planned for MapR Event Store in 2018. I can't wait to report back in a few months with some of the other stuff we've been working on.