12 min read
Keeping these different systems up to date requires an architecture that can synchronize them in real time as data is updated. Furthermore, meeting audit requirements in Healthcare requires the ability to apply granular cross-datacenter replication policies to data and be able to provide detailed lineage information for each record. This post will describe how stream-first architectures can solve these challenges, and look at how this has been implemented at Liaison Technologies.
MapR Event Store is a new distributed messaging system that enables producers and consumers to exchange events in real time via the Apache Kafka 0.9 API. Topics are logical collections of messages that organize events into categories.
Topics are partitioned, spreading the load for parallel messaging across multiple servers, which provides for faster throughput and scalability.
Partitions, which exist within topics, are parallel and ordered sequences of messages that are continually appended to. You can think of a partitioned Topic like a queue; events are delivered in the order they are received.
Unlike a queue, events are persisted. Even after they are delivered, they remain on the partition, available to other consumers.
Older messages are automatically deleted based on the Stream’s time-to-live setting. If the setting is 0, then they will never be deleted.
Messages are not deleted from Topics when read, and topics can have multiple different consumers. This allows processing of the same messages by different consumers for different purposes.
Imagine that each “event” is an incremental update to an entry in a database. In this case, the state of a particular entry is simply the accumulation of events pertaining to that entry. This is sometimes called the duality of streams and tables. In the example below, the Stream persists the queue of all deposit and withdrawal events, and the database table persists the current account balances.
Which one of these, the Stream or the Database, makes a better system of record? The events in the Stream can be used to reconstruct the current account balances in the Database, but not the other way around. Database replication actually works by suppliers writing changes to a change log, and consumers applying the changes locally.
With a Stream, events can also be re-processed to create a new index, cache, or view of the data.
The Consumer simply reads from the oldest message to the latest to create a new view of the data.
While a stream can be used as a proxy for a database, it isn’t a replacement—there are lots of databases out there, and each use different technologies depending on how the data is used, optimized for a type of write or read pattern: graph query, search, or document. What if you need to have the same set of data for different databases, for different types of queries coming in? The Stream can act as the distribution point for multiple databases, with each one providing a different read pattern. All changes to application state are persisted to an event store, which is the system of record. The event store provides rebuilding state by re-running the events in the stream—this is the Event Sourcing pattern.
Events funnel out to databases which are consumers of the stream. Polyglot persistence provides different specialized materialized views. Using a different model for reading than for writing is the Command Query Responsibility Separation pattern. Martin Kleppmann called this turning the database upside down.
There are several other use cases for modeling databases with streams:
Liaison Technologies has been doing B2B integration for about 16 years—taking data in from trading partners, transforming it in a way that the target would like to see it, and then sending it out. Customers started asking Liaison to hold on to the data and empower them to gain greater insights. Liaison now provides customers both integration and data management capabilities to master and explore data on a single unified platform the company calls ALLOY. The Liaison ALLOY™ Platform and its ALLOY Health™ Platform, developed for the specific needs of the healthcare and life sciences industries, also facilitate better analysis with quality data through the customer’s choice of data analytics and business intelligence tools. ALLOY’s unique architecture provides for streaming data processing and real-time visibility into a single flow of cleansed, harmonized data from and available to all applications.
Let's take a look at a specific use case for Liaison’s ALLOY Health™ platform.
Georgia Health Connect (GaHC) is a state health information network. Data from hospitals, providers, and practices flows into the ALLOY Health Platform, where it can be explored and analyzed with Clinical Data Viewer, Reporting, and Analytics interfaces. There is also a connection to a Health Information Exchange where other systems contribute data to the state of Georgia. Records from doctors, hospitals, and lab orders are stored in ALLOY Health, and can provide population health queries like:
You have to store all of the data in order to be able to answer these type of queries over time.
To learn more, read the complete case study about how Liaison Powers Georgia Health Connect.
Of course, these use cases have to sit in an environment compliant with privacy and security standards. The entire ALLOY Platform has been certified compliant with major standards for ensuring data privacy, security and trust: the federal privacy standard for identifiable health data (HIPAA), the security standard for payment card information (PCI DSS), and the control standard for service organizations (SOC 2).
Data breaches are happening at an alarming rate—in 2015, there were 253 healthcare breaches with a combined loss of over 112 million records.Target lost 440 million in the quarter after their security breach, and the cost of Anthem’s security breach was about 12 billion.
The relational database is no longer the only tool; you can now use the most appropriate tool for the shape of your data and for the shape of your queries. With a graph database, you can use relationships between entities to perform queries that are very difficult to do in a relational database. Take patient matching as an example: with contextual data in a graph, you can determine things like two patients with similar names went to the same doctor and got the same prescription, and be 95% sure that they are the same.
You no longer have to cram your use case into a relational database, but tools like Kafka were built by companies like Linkedin, originally for themselves, without a lot of thought around compliance. There is a gap between these tools and the problems of operating in regulated industries.
Now let’s look at how a stream-first architecture has been implemented at Liaison.
Compliance auditors are content when they can see:
With Streams replication you can create a backup copy of a stream for producers and consumers to fail over to if the original stream goes offline.
The non-Streaming and non big data things that you have to do for compliance are also riveting:
Compliance is hard, so you need to make sure you get the lineage, auditing, and replication features from your big data systems; if it’s not there, then you have to build it yourself. MapR Event Store helped hugely with this, and you also have to have compliant software processes and procedures.
The Liaison ALLOY Health Platform design/architecture is not completely new. It builds upon these design patterns:
The Liaison ALLOY Health Platform design solved some problems, and MapR solved a lot of other issues:
All of the components of the use case architecture we just discussed can run on the same cluster with the MapR Data Platform.
To find out more:
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.