August 29, 2016 | BY Carol McDonald
Building a robust, responsive, and secure data service for healthcare is tricky. For starters, healthcare data lends itself to multiple models:
- Document representation for patient profile views or updates
- Graph representation to query relationships between patients, providers, and medications
- Search representation for advanced lookups
Keeping these different systems up to date requires an architecture that can synchronize them in real time as data is updated. Furthermore, meeting audit requirements in Healthcare requires the ability to apply granular cross-datacenter replication policies to data and be able to provide detailed lineage information for each record. This post will describe how stream-first architectures can solve these challenges, and look at how this has been implemented at Liaison Technologies.
Stream as a System of Record Concepts
MapR Streams is a new distributed messaging system that enables producers and consumers to exchange events in real time via the Apache Kafka 0.9 API. Topics are logical collections of messages that organize events into categories.
Partition for Good Concurrency
Topics are partitioned, spreading the load for parallel messaging across multiple servers, which provides for faster throughput and scalability.
Each Partition is a Queue
Partitions, which exist within topics, are parallel and ordered sequences of messages that are continually appended to. You can think of a partitioned Topic like a queue; events are delivered in the order they are received.
Unlike a queue, events are persisted. Even after they are delivered, they remain on the partition, available to other consumers.
Older messages are automatically deleted based on the Stream’s time-to-live setting. If the setting is 0, then they will never be deleted.
Messages are not deleted from Topics when read, and topics can have multiple different consumers. This allows processing of the same messages by different consumers for different purposes.
The Stream is the System of Record
Imagine that each “event” is an incremental update to an entry in a database. In this case, the state of a particular entry is simply the accumulation of events pertaining to that entry. This is sometimes called the duality of streams and tables. In the example below, the Stream persists the queue of all deposit and withdrawal events, and the database table persists the current account balances.
Which one of these, the Stream or the Database, makes a better system of record? The events in the Stream can be used to reconstruct the current account balances in the Database, but not the other way around. Database replication actually works by suppliers writing changes to a change log, and consumers applying the changes locally.
With a Stream, events can also be re-processed to create a new index, cache, or view of the data.
The Consumer simply reads from the oldest message to the latest to create a new view of the data.
Event Sourcing (ES) and Command Query Responsibility Segregation (CQRS) or Turning the Database Upside Down
While a stream can be used as a proxy for a database, it isn’t a replacement—there are lots of databases out there, and each use different technologies depending on how the data is used, optimized for a type of write or read pattern: graph query, search, or document. What if you need to have the same set of data for different databases, for different types of queries coming in? The Stream can act as the distribution point for multiple databases, with each one providing a different read pattern. All changes to application state are persisted to an event store, which is the system of record. The event store provides rebuilding state by re-running the events in the stream—this is the Event Sourcing pattern.
Events funnel out to databases which are consumers of the stream. Polyglot persistence provides different specialized materialized views. Using a different model for reading than for writing is the Command Query Responsibility Separation pattern. Martin Kleppmann called this turning the database upside down.
There are several other use cases for modeling databases with streams:
- Lineage: how did BradA’s balance get so low?
- Auditing: who deposited/withdrew from the account ID BradA?
- Rewind: to see what the status of the accounts were last year.
- Integrity: can I trust that the data hasn’t been tampered with? (yes, because Streams are immutable)
Revolutionizing Healthcare Data Architecture at Liaison Technologies, Inc.
Liaison Technologies has been doing B2B integration for about 16 years—taking data in from trading partners, transforming it in a way that the target would like to see it, and then sending it out. Customers started asking Liaison to hold on to the data and empower them to gain greater insights. Liaison now provides customers both integration and data management capabilities to master and explore data on a single unified platform the company calls ALLOY. The Liaison ALLOY™ Platform and its ALLOY Health™ Platform, developed for the specific needs of the healthcare and life sciences industries, also facilitate better analysis with quality data through the customer’s choice of data analytics and business intelligence tools. ALLOY’s unique architecture provides for streaming data processing and real-time visibility into a single flow of cleansed, harmonized data from and available to all applications.
Let's take a look at a specific use case for Liaison’s ALLOY Health™ platform.
Liaison ALLOY Health and Georgia Health Connect (GaHC)
Georgia Health Connect (GaHC) is a state health information network. Data from hospitals, providers, and practices flows into the ALLOY Health Platform, where it can be explored and analyzed with Clinical Data Viewer, Reporting, and Analytics interfaces. There is also a connection to a Health Information Exchange where other systems contribute data to the state of Georgia. Records from doctors, hospitals, and lab orders are stored in ALLOY Health, and can provide population health queries like:
- What are the outcomes in the entire state on diabetes?
- Are there doctors that are doing this better than others?
You have to store all of the data in order to be able to answer these type of queries over time.
To learn more, read the complete case study about how Liaison Powers Georgia Health Connect.
Of course, these use cases have to sit in an environment compliant with privacy and security standards. The entire ALLOY Platform has been certified compliant with major standards for ensuring data privacy, security and trust: the federal privacy standard for identifiable health data (HIPAA), the security standard for payment card information (PCI DSS), and the control standard for service organizations (SOC 2).
The Financial Impact of Healthcare Breaches
Data breaches are happening at an alarming rate—in 2015, there were 253 healthcare breaches with a combined loss of over 112 million records.Target lost 440 million in the quarter after their security breach, and the cost of Anthem’s security breach was about 12 billion.
The Relational Database Is Not the Only Tool
The relational database is no longer the only tool; you can now use the most appropriate tool for the shape of your data and for the shape of your queries. With a graph database, you can use relationships between entities to perform queries that are very difficult to do in a relational database. Take patient matching as an example: with contextual data in a graph, you can determine things like two patients with similar names went to the same doctor and got the same prescription, and be 95% sure that they are the same.
- Patient id 1234, name Jon Smith, visited doctor 86, and was prescribed prescription 9876
- Patient id 999, name Johnathan Smith, visited doctor 86, and was prescribed prescription 9876 at same dosage
Mind the Gap
You no longer have to cram your use case into a relational database, but tools like Kafka were built by companies like Linkedin, originally for themselves, without a lot of thought around compliance. There is a gap between these tools and the problems of operating in regulated industries.
Applied “Stream System of Record” at Liaison
Now let’s look at how a stream-first architecture has been implemented at Liaison.
- Pre-processing is done on raw data coming in to obfuscate sensitive information before persisting events in the immutable log (Stream).
- The MapR-Streams immutable log is set to never throw data away.
- The events become the system of record and can be processed by different consumers based on the use case and permissions.
- Different work flows, Onyx or Apache Spark, read from the stream topic and store the data in different materialized views: MapR-DB HBase, MapR-DB JSON document, graph, and search databases so that services or apps always have the most up-to-date view of data in the most appropriate format.
- Microservices do interesting things like patient matching, or labs orders to results matching, with the data in the best tool for the job.
- Events can also be replayed and loaded into a new system.
Smiling Compliance Auditors
Compliance auditors are content when they can see:
- Data lineage: the stream provides an infinite, immutable log of each data change. Data is never deleted, so changes are always traceable.
- Audit logging: who has written to, updated, or even sometimes read data.
- Wire-level encryption: data in transit has to be encrypted.
- Data at rest encryption: persisted data has to be encrypted.
Replication for Disaster Recovery
With Streams replication you can create a backup copy of a stream for producers and consumers to fail over to if the original stream goes offline.
The non-Streaming and non big data things that you have to do for compliance are also riveting:
- Software development lifecycle has to be documented
- System hardening
- Dev vs. Ops: developers can not have access to production
- Patch management
Compliance is hard, so you need to make sure you get the lineage, auditing, and replication features from your big data systems; if it’s not there, then you have to build it yourself. MapR Streams helped hugely with this, and you also have to have compliant software processes and procedures.
The Liaison ALLOY Health Platform design/architecture is not completely new. It builds upon these design patterns:
- Turning the database upside down
- Event Sourcing, Command Query Responsibility Separation, Polyglot Persistence
- Kappa Architecture
The Liaison ALLOY Health Platform design solved some problems, and MapR solved a lot of other issues:
- Unified security was huge.
- Replication from data center to data center.
- Fewer things to manage: converging multiple clusters for Kafka/HBase/Spark/Hadoop into one cluster.
- Multi-tenancy: Kafka has a maximum number of topics per cluster, and we needed a lot of topics. With MapR Streams, we are able to have essentially unlimited topics for lots of tenants.
All of the components of the use case architecture we just discussed can run on the same cluster with the MapR Converged Data Platform.
To find out more: