Big Data Analytics Use Cases: R/X For Healthcare

Contributed by

8 min read

In the US, more money is spent on healthcare – about $3.2 trillion last year – than in any other discrete segment of the economy. By some estimates, that is about $600 billion more than should be spent, given the country’s size and wealth. These costs are pummeling consumers and their employers around the world. This year the global costs of employee healthcare benefits will rise more than 9%, according to recent research. In the US the increase will be 15%. Thus the pressure on healthcare providers to reduce costs is enormous. What’s more, these providers are also tasked with improving patient care and treatment outcomes.

Healthcare providers have made strides in collecting and sharing massive amounts of data in electronic health records and health information exchanges in an effort to streamline patient care and reduce costs. Until recently, these efforts had little impact on costs, or on patient outcomes for that matter.

What had been lacking is the analytics piece of the equation, the missing link to making sense of all this and other data. By one conservative estimate, applying Big Data analytics on a health system-wide basis could reduce healthcare spending in the US by a staggering $300-$450 billion annually. Moreover, there is every reason to believe that patient outcomes will improve as well, perhaps substantially.

Here below are some of the more common use cases for Hadoop and Big Data analytics in healthcare. It is worth noting that in the healthcare environment, more than three quarters of the data available for use is unstructured data. It comes from an ever-growing list of sources, including thousands of devices and sensors, medical staff notes, lab tests, imaging reports, and many outside sources of medical information.

This is data that is the bread and butter for the Hadoop platform. It isn’t just the unstructured nature of most healthcare data that matches up well with Hadoop. It is also the sheer enormity of the data volumes from these and other sources, which would easily overwhelm traditional analytics platforms. Hadoop is capable of acquiring and storing gigantic masses of any kind of data. And Hadoop’s ability to run on inexpensive but very powerful commodity hardware allow the platform to handle just about any number of tasks concurrently.

Leveraging electronic health records (EHR)

In the near term, this is arguably the most important use case for Big Data analytics on Hadoop, mainly owing to the tremendous effort in the past decade to collect and store this data. Now it needs be analyzed and broadly shared among clinicians anywhere and everywhere.

The potential cost savings are massive, deriving from benefits like reducing duplication of tests and reducing the number of patient-doctor visits. With Hadoop and Big Data analytics, clinicians can spot links between different diseases to better predict other diseases that may result, and therefore be prevented in some cases.

Detection of Fraud, Waste and Abuse

The fraud potential is obviously very high in an environment where massive sums of money can pass through multiple providers and payees to pay for services billed by scores of groups, individuals, departments and others – at times using multiple currencies. Fraud is also rife today when it comes to proper determination of eligibility in Medicaid, Medicare, other managed care organizations as well as in employer and private health plans.

Hadoop and the analytics tools available today are particularly adept at analyzing various nuances and anomalies across a variety of inputs, including payment data, voluminous patient records, and benchmark data for multiple common procedures. For example, Hadoop analysts can identify oddities such as a patient’s receiving services from multiple providers at the same time; hospitals over-utilizing certain services in a given time period; or patients filling the same prescription at different places.

Moreover, using Hadoop, analysts can act on potential anomalies before payment is ever made. It has been a game-changer for insurers, since retrospective investigations only reap up to 25% of the original claim.

One leading healthcare provider used a ‘data lake’ approach to aggregate huge volumes of data that then served as a data hub for various departments, including fraud prevention. Today the provider is en route to capturing an incremental 20% of fraud, waste and abuse in its claims department. And UnitedHealthcare leveraged its Hadoop Big Data platform to drive millions in annual savings through flagging high-risk providers, among other initiatives.

Finally, healthcare providers are understandably concerned about security in general. Thus the ability to co-locate data and making it available for multi-tenancy is often a requirement for a Big Data platform. However, healthcare providers require that the very highest levels of customizable security accompany this multi-tenancy feature.

Speedier claims processing

Doing just about anything faster equates to less overall cost, and that is true with healthcare claims processing. Often today, slow processing is a result of these challenges: legacy platforms that support increasingly unstructured formats, growth rates beyond what the platforms can handle and the increasing number of external sources.

Hadoop works with solutions like Hive, Pig and HBase to migrate these workloads off sluggish legacy systems and markedly speed up claims processes. Also, such schemes allow claims processors to analyze all data in the payment system, not on subsets as is done on the legacy platforms.

Genome sequencing and precision medicine

The Human Genome Project was begun in 1990 and declared officially completed almost 13 years later – at a huge cost in money and research time. With Hadoop and Big Data analytics, this same sequencing is now on its way to commodity status. This will give caregivers unprecedented information both on groups of patients and individuals to determine disease susceptibility and obvious preventative measures to stop the disease from ever occurring. This is also known as ‘precision medicine’.

Again here we have a tailor-made application for Hadoop. Genomic sequences are massive data files. Analyzing these files produce even more massive files, both of which would choke a conventional relational database and its associated storage. By contrast, a Hadoop cluster makes relatively quick and easy work, regardless of file complexity or size.


There are several other prominent uses cases emerging for Hadoop and Big Data analytics (see chart below). They are all owing to the peerless capability of Hadoop solutions to ingest easily and store massive amounts of disparate data from just about any source. Then the growing number of robust Big Data analytics solutions can crunch this data in heretofore unimaginable ways, doing it all on commodity hardware at blinding speeds.

In selecting a Big Data platform and particular Hadoop distribution, be sure the platform is highly adept at handling the mix of data types in healthcare typically housed in silos, with clinical data in one silo; pharmaceutical data in another; and logistics information on hospital supplies in yet another. This platform should be flexible enough so that caregivers can use complex data like doctors’ notes and imaging files for real patient analysis, not just for archiving.

Additional Resources

This blog post was published February 21, 2017.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now