A Converged Application Blueprint for Financial Data

Contributed by

4 min read

How to Use the Blueprint

The blueprint consists of a sample financial services application that serves as a tutorial and starting point for a converged application that includes high speed streaming. Written for the MapR Converged Data Platform, the application (and included code examples, written in Java, with data generation in Python) show how to take advantage of the advanced streaming capabilities of MapR and to create a service that predictably scales according to business needs.

To get started using the blueprint, review the overview below, then download the code from GitHub and follow the instructions posted in the README in the github repo. You can install the MapR Converged Community Edition as a platform for running the blueprint. Get started with the installation by going to mapr.com/download, or you can run the example on a single-node VM instance which can be downloaded at mapr.com/sandbox.

Architecture of the Financial Services Example Application (Blueprint)

The blueprint consists of a sample application and can serve as a useful architecture example for developing streaming applications.

The purpose of the application is to provide a service moving ticks from the offerer (or sender) and those to whom the offer is extended (called recipient), and enable interactive analytics on the large stream of data. Both sender and recipient are customers of the service and each will occasionally want to know their situation in terms of what offers they have made and what offers they have received.

Financial "tick" and trade data is ingested on the left side of the diagram. This consists of actual trades, consisting of the fixed-width New York Stock Exhange (NYSE) format, as well as simulated "Level 2" bid and ask data leading up to each trade. A microservice consumes the stream and provides fast indexing by sender.

Each entry of the input data has format:

{time, sender, id, symbol, prices, ..., [recipient*]}

Each entry can have only one sender, but potentially many recipients.

The application runs at a high rate of data processing (over 300,000 messages per second) and the provided Java source code shows how to develop an application that can handle this level of throughput throughout the entire production environment, including handling partitioning of topics and how to index data to make it both persistent and be able to support interactive queries.

To provide predictable scalability, multiple MapR Streams (Kafka API) consumers can be started within the application and will automatically load-balance across partitions, enabling scale with increasing data rates, and the stream can be queried directly as the "system of record" using the provided indexing techniques.

Ready to check it out? Take the next step and get a MapR cluster running for free, and visit the github page to view the source code of the application.


This blog post was published October 26, 2016.
Categories

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.


Get our latest posts in your inbox

Subscribe Now