Big data platforms are changing the way we manage data. Legacy systems often require throwing away older data, moving large data sets from one silo to another, or spending exorbitant amounts to handle growth. But those are becoming the modus operandi of the past. Scale, speed, and agility are front and center with the modern data architectures that are designed for big data. Data integrity, security, and reliability remain critical goals as well. The notion of a “converged application” represents the next generation of business applications for today and the future.
Converged applications are software applications that can simultaneously process both operational and analytical data, allowing real-time, interactive access to both current and historical data. This class of applications deliver real-time analytics, high frequency decisioning, and other solution architectures that require immediate operations on large volumes of data.
Converged applications provide real-time access to large volumes of data in an efficient architecture to cost-effectively drive combined operational and analytical workloads on big data. They are often deployed in a modular architecture, especially as microservices that work together as a cohesive unit, not as monolithic processes in distinct data silos that require continual data movement. This architecture leads to greater responsiveness, better decisions, less complexity, lower cost, and lower risk.
The blueprint consists of a sample financial services application that serves as a tutorial and starting point for a converged application that includes high speed streaming. Written for the MapR Converged Data Platform, the application (and included code examples, written in Java, with data generation in Python) show how to take advantage of the advanced streaming capabilities of MapR and to create a service that predictably scales according to business needs.
To get started using the blueprint, review the overview below, then download the code from GitHub and follow the instructions posted in the README in the github repo. You can install the MapR Converged Community Edition as a platform for running the blueprint. Get started with the installation by going to mapr.com/download, or you can run the example on a single-node VM instance which can be downloaded at mapr.com/sandbox.
The blueprint consists of a sample application and can serve as a useful architecture example for developing streaming applications.
The purpose of the application is to provide a service moving ticks from the offerer (or sender) and those to whom the offer is extended (called recipient), and enable interactive analytics on the large stream of data. Both sender and recipient are customers of the service and each will occasionally want to know their situation in terms of what offers they have made and what offers they have received.
Financial "tick" and trade data is ingested on the left side of the diagram. This consists of actual trades, consisting of the fixed-width New York Stock Exhange (NYSE) format, as well as simulated "Level 2" bid and ask data leading up to each trade. A microservice consumes the stream and provides fast indexing by sender.
Each entry of the input data has format:
{time, sender, id, symbol, prices, ..., [recipient*]}
Each entry can have only one sender, but potentially many recipients.
The application runs at a high rate of data processing (over 300,000 messages per second) and the provided Java source code shows how to develop an application that can handle this level of throughput throughout the entire production environment, including handling partitioning of topics and how to index data to make it both persistent and be able to support interactive queries.
To provide predictable scalability, multiple MapR Streams (Kafka API) consumers can be started within the application and will automatically load-balance across partitions, enabling scale with increasing data rates, and the stream can be queried directly as the "system of record" using the provided indexing techniques.
Ready to check it out? Take the next step and get a MapR cluster running for free, and visit the github page to view the source code of the application.