5 Big Data Production Examples in Healthcare

Contributed by

7 min read

Editor's Note: This blog post is an excerpt from the MapR Guide to Big Data in Healthcare. To read more, you can download it here.

Healthcare costs are driving the demand for big data-driven healthcare applications. Technology decision-makers in healthcare systems cannot ignore the increased efficiencies, the attractive economics, and the rapid pace of innovation that can now be applied to delivering and paying for healthcare. Many are finding that new standards and incentives for the digitizing and sharing of healthcare data — along with improvements and decreasing costs in storage and parallel processing on commodity hardware — are causing a big data revolution in healthcare with the goal of better care at lower cost.

The healthcare industry can benefit immensely from the use of advanced analytics and big data technologies, and the MapR Data Platform offers the perfect solution. In this post, we will look at 3 big data production examples in healthcare.

1. Liason Technologies: Streaming System of Record for Healthcare

Liaison Technologies provides cloud-based solutions to help organizations integrate, manage, and secure data across the enterprise. One vertical solution they provide is for the healthcare and life sciences industry, which comes with two challenges — meeting HIPAA compliance requirements and the proliferation of data formats and representations. With MapR Event Store, the data lineage portion of the compliance challenge is solved because the stream becomes a system of record by being an infinite, immutable log of each data change. To illustrate the latter challenge, a patient record may be consumed in different ways — a document representation, a graph representation, or search — by different users, such as pharmaceutical companies, hospitals, clinics, physicians, etc. By streaming data changes in real time to the MapR Database, HBase, MapR Database JSON document, graph, and search databases, users always have the most up-to-date view of data in the most appropriate format. Further, by implementing this service on the MapR Data Platform, Liaison is able to secure all of the data components together, avoiding data and security silos that alternate solutions require.

2. Novartis Genomics

Next Generation Sequencing (NGS) is a classic big data application that deals with the dual challenge of vast amounts of raw heterogeneous data and the fact that best practices in NGS research are an actively moving target. Additionally, much of the cutting-edge research requires heavy interaction with diverse data from external organizations. It requires workflow tools that are robust enough to process vast amounts of raw NGS data yet flexible enough to keep up with quickly changing research techniques. It also requires a way to meaningfully integrate data from Novartis with data from these large external organizations — such as 1000 Genomes, NIH’s GTEx (Genotype-Tissue Expression), and TCGA (The Cancer Genome Atlas) — paying particular attention to clinical, phenotypical, experimental, and other associated data.

The Novartis team chose Hadoop and Apache Spark to build a workflow system that allows them to integrate, process, and analyze diverse data for Next Generation Sequencing (NGS) research, while being responsive to advances in the scientific literature.

3. Healthcare IoT Startup: Working to Classify Heart Conditions Faster

The current heart rhythm analysis process is slow and classification is done manually. They do batch uploads from the devices into the analysis software machines to have medical analysts look at the classification data, and then submit a report to the doctors and hospital who then make medical decisions about the patients. The process takes over 24 hours, a long lag before doctors can access the patient data, increasing the risk of medical emergencies.

With MapR XD, Telemed will now be able to ingest data from various medical devices directly via NFS into their cluster for real-time patient insight. This solution needed to be High Availability and also provide multi-tenancy (due to HIPAA) as they start hosting various hospital patient data and medical device company data. Being able to segment that data by their customers was really important.

With the help of MapR Professional Services, they have been able to build out a solution to hit their July 18th HIPAA review deadline and provide an architecture that fits all the requirements in terms of HA, multi-tenancy, and real-time insights. The CEO has fulfilled his requirement and deadline to his investors and the company will be on track to start selling their SaaS solutions in Q3/Q4.


Improving patient outcomes at the same or even less cost is an extraordinarily tall order for any healthcare provider, given overall costs of healthcare are rising in the US at a lofty 15% clip. Full-scale digital transformation is the key to making this goal a reality, with digitization, enhanced communications, and big data analytics being the legs to support the transformation effort. The many emerging use cases for big data analytics are intimately tied to the ability of Hadoop-based solutions to acquire and store massive quantities of disparate data—structured and unstructured—from just about any source and present it for in-depth analysis.

In selecting a big data platform and in particular a Hadoop distribution, be sure the platform is highly adept at handling the mix of data types in healthcare typically housed in silos, with clinical data in one silo, pharmaceutical data in another, and logistics information on hospital supplies in yet another. This platform should be flexible enough so that caregivers can use complex data like doctors’ notes and imaging files for real patient analysis, not just for archiving.

This blog post was published February 27, 2017.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now