How to Use Data Science and Machine Learning to Revolutionize 360° Patient Views in Health Care

Contributed by

6 min read

No other industry can get more value from knowing their customers than the healthcare industry. This post is the third in a series where we will go over examples of how MapR data scientist Joe Blue assisted MapR customers, in this case a health insurance company, to identify new data sources and apply machine learning algorithms in order to better understand their customers. If you have not already read the first part of this customer 360° series and second part of this customer 360 series, then it would be good to read that first. In this third part, we will cover a patient 360° example, presenting the before, during and after. The goal of the patient 360° revolution is to personalize and target healthcare for better outcomes.

Patient 360

Specifically, the goal of the patient 360° revolution is to:

  • Find additional information
  • Accentuate patient treatment expertise with new learning
  • Use new learning to personalize engagement and improve outcomes

Before, During and After

Use Case: Health Insurance Before

Health insurance story: There are a lot of Patient 360 examples, this one will focus on cervical cancer screening. According to the CDC, more than 12,000 women get cervical cancer every year. Screening tests prevent cervical cancer by detecting the HPV virus and/or abnormal cells which could cause or develop into cancer. A lot of deaths annually could be prevented if screening is done at the right schedule. The goal of this Patient 360 is to recommend to female Health insurance members when they should get their screening. A typical before situation would be rules based using limited siloed data, sending out mass emails. Rules give a big bucket of people who are all equally in need of a screening. Rules-based approaches rely on thresholds derived from averages across a population and, as a result, miss opportunities to deliver more targeted personalized methods or treatment. Mass emailing is easy but a more targeted and personalized method is more effective.

Limited Siloed Data

A better approach is to leverage machine learning with heterogeneous data sources to surpass the one-size-fits-all, rules-based approach. With Machine learning you can analyze large, heterogeneous datasets to identify individuals who will benefit the most in order to maximize the impact but minimize the effort. With machine learning we can rank order the women most likely not to have had a screening in an intelligent manner.

Health Insurance Customer 360 During = Data Science

There are a lot of different data sources for patient data, some of which traditional analytics or databases cannot take advantage of. Combining different data sources with multiple learning algorithms using ensemble methods can give better predictive performance than could be obtained from a single algorithm or data source.

NLP, or Text analytics (such as TF/IDF described in part 1 of this series) on doctor’s notes can find clues about whether this patient was screened or not.

NLP Finds Screening Clues

Information in claims and prescriptions data can be used to segment or cluster similar patients based on diagnosis, procedures and prescriptions. These patient similarity clusters can be used, similar to the way recommendation engines work, to analyze what combination of diagnosis, procedures and medication are generally concomitant with screening.


Graph analytics on claims can look at the relationships between patients, doctors and screenings, which can give information such as: “what is the rate of screening for the doctor this patient goes to?”.

Graph Analytics for Common Providers

All of these difference data sources and algorithms can be combined to find the insurance customers least likely to have had a screening and most in need of the message to get screened.

Cervical Cancer Screening - DURING

Health Insurance Customer 360 After

The health Insurance company can now apply this new knowledge about the members most in need of a screening, to change from mass emailing members to a more efficient targeted approach. With the new learning they can rank women in need of a screening from highest to lowest. The 5% of women that are least likely to have had a screening could be called on the telephone, the next 10% could be flagged for the provider to discuss with them, and the rest could be sent an email. This helps the insurer reach more people in need of a screening with the existing resources.

Cervical Cancer Screening - AFTER

Also once these different data sources have been made available on the MapR platform, there is a lot more that can be done with this data for different use cases.

Building a Healthcare Data Lake on MapR


The overall theme for the customer 360 is that you have structured data, which you are good at mining, but if you could combine this with more unstructured data sources and machine learning you could extract more value from your data.

In this example, we discussed how data science can combine different sources of Health Insurance data with machine learning in order to better target recommendations for cancer screenings.

Want to learn more?

This blog post was published October 30, 2017.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now