5 Big Data Trends in Healthcare for 2017

Contributed by

11 min read

This blog post is an excerpt from the MapR Guide to Big Data in Healthcare. To read more, you can download it here.

The healthcare industry, perhaps more than any other, is on the brink of a major transformation through the use of advanced analytics and big data technologies.

In this post, we’re going to talk about 5 big data trends in healthcare for 2017.

1. Value-Based, Patient-Centric Care

A goal of modern healthcare systems is to provide optimal health care through the meaningful use of health information technology in order to:

  • Improve healthcare quality and coordination, so that outcomes are consistent with current professional knowledge
  • Reduce healthcare costs; reduce avoidable overuse
  • Provide support for reformed payment structures

Health payors such as insurers and public health systems (e.g., Medicare and Medicaid) are in the early stages of shifting from fee-for-service compensation to value-based data-driven incentives that reward high quality, cost-effective patient care and demonstrate meaningful use of electronic health records. This approach requires significant improvements in reporting, claims processing, data management, and process automation.

The focus on value-based care corresponds with an increased focus on patient-centric care. By leveraging technology and focusing healthcare processes on patient outcomes, a continuum of care, doctors, hospitals, and health insurance need to work with each other to personalize care that is efficient and price conscious, transparent in its delivery and billing, and measured based on patient satisfaction.

Thus, the goal now is to begin to move more decisively away from the long-standing fee-for-service practice by which payments are made to providers. In essence, providers get paid for seeing and treating patients. Currently, there is little or no reward when and if providers improve quality of services, boost patient outcomes, or reduce costs. Fee-for-service has been a major roadblock in plans or desires to invest in digital solutions to, say, improve patient outcomes if the providers cannot recoup their investments. As one senior executive at KPMG put it, “Instead of rewarding leaders for transforming healthcare, our systems reward leaders for making narrow improvements within them.”

Current thinking around long-standing, crucial payment practices is beginning to change, paving the way for a robust digital transformation of healthcare.

2. The Healthcare Internet of Things (IoT)

Also called the Industrial Internet, these terms refer to the rapidly increasing number of smart, interconnected devices and sensors and the tidal volumes of data they will generate and move between devices, and ultimately to people. Spending on healthcare IoT could top $120 billion in just four years, by some estimates. And most of the data created by the healthcare IoT is of the unstructured variety, creating a major role for Hadoop and advanced big data analytics working within the Hadoop framework.

Today, a variety of devices monitor every sort of patient behavior – from glucose monitors to fetal monitors to electrocardiograms to blood pressure. Many of these measurements require a follow-up visit with a physician. But smarter monitoring devices communicating with other patient devices could greatly refine this process, possibly lessening the needs for direct physician intervention and maybe replacing it with a phone call from a nurse. Other smart devices already in place can detect if medicines are being taken regularly at home from smart dispensers. If not, they can initiate a call or other contact from providers to get patients properly medicated. The possibilities offered by the healthcare IoT to lower costs and improve patient care are almost limitless.

3. Reducing Fraud, Waste, and Abuse

The cost of fraud, waste, and abuse in the healthcare industry is a key contributor to spiraling healthcare costs in the United States, but big data analytics can be a game changer for healthcare fraud. The Centers for Medicare and Medicaid Services prevented more than $210.7 million in healthcare fraud
in one year using predictive analytics. UnitedHealthcare transitioned to a predictive modeling environment based on a Hadoop big data platform, in order to identify inaccurate claims in a systematic, repeatable way and generated a 2200% return on their big data/advanced technology.

The key to identifying fraud is the ability to store and go back in history to analyze large unstructured datasets of historical claims and to use machine learning algorithms to detect anomalies and patterns.

Healthcare organizations can analyze patient records and billing to detect anomalies such as a hospital’s overutilization of services in short time periods, patients receiving healthcare services from different hospitals in different locations simultaneously, or identical prescriptions for the same patient filled in multiple locations.

One major healthcare provider leveraged a data lake approach as it aggregated massive volumes of data as a data hub for various departments, including fraud prevention. As a result, the provider is on the way to capturing an incremental 20% of fraud, waste, and abuse in its claims department.

The Centers for Medicare and Medicaid Services uses predictive analytics to assign risk scores to specific claims and providers, to identify billing patterns and claim aberrancies difficult to detect by previous methods. Rules-based models flag certain charges automatically. Anomaly models raise suspicion, based on factors that seem improbable. Predictive models compare charges against a fraud profile and raise suspicion. Graph models raise suspicion based on the relations of a provider; fraudulent billers are often organized as tight networks.

4. Predictive Analytics to Improve Outcomes

Initiatives such as meaningful use are accelerating the adoption of Electronic Health Records (EHR), and the volume and detail of patient information is growing rapidly. The surge in the creation and broadening use of EHR was driven in part by a $30 billion federal government stimulus, provided
by the Health Information Technology for Economic and Clinical Health (HITECH) Act. The Act was designed specifically to provide incentives to adopt EHR and then encourage the sharing of patient information by clinicians everywhere in an attempt to lower costs, speed diagnosis, and improve patient outcomes. Being able to combine and analyze a variety of structured and unstructured data across multiple data sources aids in the accuracy of diagnosing patient conditions, matching treatments with outcomes, and predicting patients at risk for disease or readmission.

Predictive modeling over data derived from EHRs is being used for early diagnosis and is reducing mortality rates from problems such as congestive heart failure and sepsis. Congestive Heart Failure (CHF) accounts for the most healthcare spending. The earlier it is diagnosed, the better it can be treated, avoiding expensive complications, but early manifestations can be easily missed by physicians. A machine learning example from Georgia Tech demonstrated that machine learning algorithms could look at many more factors in patients’ charts than doctors, and by adding additional features, there was a substantial increase in the ability of the model to distinguish people who have CHF from people who don’t.

Predictive modeling and machine learning on large sample sizes, with more patient data, can uncover nuances and patterns that couldn’t be previously uncovered. Optum Labs has collected EHRs of over 30 million patients to create a database for predictive analytics tools that will help doctors make big data-informed decisions to improve patients’ treatment.

5. Real-time Monitoring of Patients

Healthcare facilities are looking to provide more proactive care to their patients by constantly monitoring patient vital signs. The data from these various monitors can be analyzed in real time and send alerts to care providers so they know instantly about changes in a patient’s condition. Processing real-time events with machine learning algorithms can provide physicians with insights to help them make lifesaving decisions and allow for effective interventions.

Wearable sensors and devices present the opportunity for caregivers to interact with patients in entirely new ways, making healthcare more convenient and persistent. Real-time monitoring changes the very nature of the relationship in that face-to-face care is not always a necessity. As an example, applications are being used for remote or in-home monitoring of patients with chronic obstructive pulmonary disease. Other monitors track the weight of patients battling obstructive heart disease to detect fluid retention before hospitalization is required. Still others track a child’s asthma medication usage to be sure home caregivers and family members are aware of what needs to be administered, reducing visits to the ER. As is so often the case with new data volumes in healthcare, sensor data from wearable monitors is unstructured data that yields to the data acquisition and storage capabilities of Hadoop, as well as to the power and flexibility of advanced big data analytics.


There is a move toward evidence-based medicine, which involves making use of all clinical data available and factoring that into clinical and advanced analytics. Capturing and bringing all of the information about a patient together gives a more complete view for insight into care coordination and outcomes-based reimbursement, population health management, and patient engagement and outreach. Gaining this 360-degree view of the patient can also eliminate redundant and expensive testing, reduce errors in administering and prescribing drugs, and even avoid preventable deaths.

Also, it is certainly noteworthy that in today’s healthcare environment, a clear majority of the data generated and therefore available for use — 75% or more of the data by some estimates — is unstructured data. It emerges from sources like the rapidly growing number of digital devices and sensors, emails, doctors’ and nurses’ notes, laboratory tests, and third party sources outside the hospital. It is the unstructured nature of this data along with the sheer enormity of the volumes generated that make healthcare data a perfect match for the MapR Data Platform. The MapR Platform can acquire and store enormous masses of structured and unstructured data of any type, running on powerful, cost-effective hardware. Then, with the overlay of advanced big data analytics, healthcare providers and executives can make great leaps ahead in terms of improving patient outcomes, while lowering the costs of doing so.

This blog post was published February 13, 2017.