Contributor: Carol McDonald

MapR Converge Blog author, Carol McDonald

Carol has extensive experience as a developer and architect building complex, mission-critical applications in the Banking, Health Insurance and Telecom industries. As a Java Technology Evangelist at Sun Microsystems, Carol traveled all over the world speaking at Sun Tech Days, JUGs, companies, and conferences. She is a recognized speaker in Java communities.

Blog Posts by Carol McDonald

February 08, 2018 | By Carol McDonald

Fast data processing pipeline for predicting flight delays using Apache APIs: Kafka, Spark Streaming and Machine Learning (part 3)

Editor's Note: This is a 3-Part Series, see the previously published posts below: Part 1 - Spark Machine Learning Part 2 - Kafka and Spark Streaming Fast Data Processing Pipeline for Predicting Flight Delays Using Apache APIs: Kafka, Spark Machine...

Read more
January 10, 2018 | By Carol McDonald

Fast data processing pipeline for predicting flight delays using Apache APIs: Kafka, Spark Streaming and Machine Learning (part 1)

Editor's Note: You can find Part 2 of this series here According to Thomas Davenport in the HBR, analytical technology has changed dramatically over the last decade, with more powerful and less expensive distributed computing across commodity servers...

Read more
January 10, 2018 | By Carol McDonald

Fast data processing pipeline for predicting flight delays using Apache APIs: Kafka, Spark Streaming and Machine Learning (part 2)

Editor's Note: You can find Part 1 of this series here According to Bob Renner, CEO of MapR Partner Liaison Technologies, in Forbes Machine Learning predictions for 2018, the possibility to blend machine learning with real-time transactional data...

Read more
October 30, 2017 | By Carol McDonald

How to Use Data Science and Machine Learning to Revolutionize 360° Patient Views in Health Care

No other industry can get more value from knowing their customers than the healthcare industry. This post is the third in a series where we will go over examples of how MapR data scientist Joe Blue assisted MapR customers, in this case a health insurance...

Read more
October 26, 2017 | By Carol McDonald

Data Modeling Guidelines for NoSQL JSON Document Databases

In this blog post, I’ll discuss how NoSQL data modeling is different from traditional relational schema data modeling, and I’ll also provide you with some guidelines for document database data modeling. Document databases, such as MapR-DB, are sometimes...

Read more
October 25, 2017 | By Carol McDonald

ETL Pipeline to Transform, Store and Explore Healthcare Dataset With Spark SQL, JSON and MapR-DB

This post is based on a recent workshop I helped develop and deliver at a large health services and innovation company’s analytics conference. This company is doing a lot of interesting analytics and machine learning on top of the MapR Converged Data...

Read more
September 29, 2017 | By Carol McDonald

Database Comparison: An In-Depth Look at How MapR-DB Does What Cassandra, HBase, and Others Can't

If you are a developer or architect working with a highly performant product, you want to understand what differentiates it, similar to a race car driver driving a highly performant car. My in depth architecture blog posts such as An In-Depth Look at...

Read more
August 16, 2017 | By Carol McDonald

Demystifying AI, Machine Learning and Deep Learning

Deep learning, machine learning, artificial intelligence - all buzzwords and representative of the future of analytics. In this post we will explain what is machine learning and deep learning at a high level with some real world examples. In future posts...

Read more
July 25, 2017 | By Carol McDonald

Applying Machine Learning to Streaming IoT for Connected Medical Devices

The combination of IoT data, streaming analytics, machine learning, and distributed computing has become more powerful and less expensive than before, enabling the storage and analysis of more data and many different types of data much faster. Some...

Read more
June 27, 2017 | By Carol McDonald

Big Data-Driven Knowledge: Trends That Are Transforming the Retail Sector

Editor's Note: The retail industry is on the verge of a major transformation through the use of advanced analytics and big data technologies like the MapR Converged Data Platform. The 'MapR Guide to Big Data in Retail' provides a comprehensive...

Read more
June 09, 2017 | By Carol McDonald

End to End Application for Monitoring Real-Time Uber Data Using Apache APIs: Kafka, Spark, HBase – Part 4: Spark Streaming, DataFrames, and HBase

Editor's Note: This is a 4-Part Series, see the previously published posts below: Part 1 - Spark Machine Learning Part 2 - Kafka and Spark Streaming Part 3 – Real-Time Dashboard Using Vert.x According to Gartner, 20.8 billion connected things will...

Read more
June 05, 2017 | By Carol McDonald

Churn Prediction with Apache Spark Machine Learning

Editor’s Note: On June 15, 2017, join Carol McDonald for a live machine learning tutorial on predicting customer churn. Register here. Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel...

Read more
May 15, 2017 | By Carol McDonald

How to Get Started with Spark Streaming and MapR Streams Using the Kafka API

The telecommunications industry is on the verge of a major transformation through the use of advanced analytics and big data technologies like the MapR Converged Data Platform. The MapR Guide to Big Data in Telecommunications is designed to help you understand...

Read more
May 11, 2017 | By Carol McDonald

Fast Cars, Big Data - How Streaming Data Can Help Formula 1

A Formula 1 race is a high-speed example of the Internet of Things, where gathering, analyzing, and acting on tremendous amounts of data in real time is essential for staying competitive. The sport’s use of such information is so sophisticated that some...

Read more
May 09, 2017 | By Carol McDonald

Big Data Opportunities for Telecommunications

The telecommunications industry is on the verge of a major transformation through the use of advanced analytics and big data technologies like the MapR Converged Data Platform. The MapR Guide to Big Data in Telecommunications is designed to help you understand...

Read more
May 04, 2017 | By Carol McDonald

End to End Application for Monitoring Real-Time Uber Data Using Apache APIs: Kafka, Spark, HBase – Part 3: Real-Time Dashboard Using Vert.x

Editor's Note: This is a 4-Part Series, see the previously published posts below: Part 1 - Spark Machine Learning Part 2 - Kafka and Spark Streaming According to Gartner, smart cities will be using about 1.39 billion connected cars, IoT sensors, and...

Read more
April 20, 2017 | By Carol McDonald

Transforming Healthcare Through Big Data

Editor's Note: This article was originally featured on Healthcare IT Outcomes. Healthcare has entered an era of major data transformation spurred by the use of advanced analytics and Big Data technologies. The catalyst for this transformation includes...

Read more
February 27, 2017 | By Carol McDonald

5 Big Data Production Examples in Healthcare

Editor's Note: This blog post is an excerpt from the MapR Guide to Big Data in Healthcare. To read more, you can download it here. Healthcare costs are driving the demand for big data-driven healthcare applications. Technology decision-makers in healthcare...

Read more
February 13, 2017 | By Carol McDonald

5 Big Data Trends in Healthcare for 2017

This blog post is an excerpt from the MapR Guide to Big Data in Healthcare. To read more, you can download it here. The healthcare industry, perhaps more than any other, is on the brink of a major transformation through the use of advanced analytics...

Read more
February 08, 2017 | By Carol McDonald

Event Driven Microservices Patterns

In this blog we will discuss some patterns which are often used in microservices applications which need to scale: Event Stream Event Sourcing Polyglot Persistence Memory Image Command Query Responsibility Separation The Motivation Uber, Gilt and others...

Read more
January 05, 2017 | By Carol McDonald

End to End Application for Monitoring Real-Time Uber Data Using Apache APIs: Kafka, Spark, HBase – Part 2: Kafka and Spark Streaming

Editor's Note: This is a 4-Part Series, see the previously published posts below: Part 1 - Spark Machine Learning Part 3 – Real-Time Dashboard Using Vert.x This post is the second part in a series where we will build a real-time example for analysis...

Read more
December 08, 2016 | By Carol McDonald

How to Use Data Science and Machine Learning to Revolutionize 360° Customer Views (Part 2)

This post is the second in a series where we will go over examples of how MapR data scientist Joe Blue assisted MapR customers, in this case a regional bank, to identify new data sources and apply machine learning algorithms in order to better understand...

Read more
December 08, 2016 | By Carol McDonald

How to Use Data Science and Machine Learning to Revolutionize 360° Customer Views (Part 2)

This post is the second in a series where we will go over examples of how MapR data scientist Joe Blue assisted MapR customers, in this case a regional bank, to identify new data sources and apply machine learning algorithms in order to better understand...

Read more
November 28, 2016 | By Carol McDonald

End to End Application for Monitoring Real-Time Uber Data Using Apache APIs: Kafka, Spark, HBase – Part 1: Spark Machine Learning

Editor's Note: This is a 4-Part Series, see the previously published posts below: Part 2 – Kafka and Spark Streaming Part 3 – Real-Time Dashboard Using Vert.x According to Gartner, by 2020, a quarter of a billion connected cars will form a major element...

Read more
October 17, 2016 | By Carol McDonald

Predicting Breast Cancer Using Apache Spark Machine Learning Logistic Regression

In this blog post, I’ll help you get started using Apache Spark’s spark.ml Logistic Regression for predicting cancer malignancy. Spark’s spark.ml library goal is to provide a set of APIs on top of DataFrames that help users create and tune machine learning...

Read more
August 29, 2016 | By Carol McDonald

How Stream-First Architecture Patterns Are Revolutionizing Healthcare Platforms

Building a robust, responsive, and secure data service for healthcare is tricky. For starters, healthcare data lends itself to multiple models: Document representation for patient profile views or updates Graph representation to query relationships between...

Read more
August 08, 2016 | By Carol McDonald

How to Use Data Science and Machine Learning to Revolutionize 360° Customer Views

There is more and more data that is available that can help inform businesses about their customers, and those businesses that successfully utilize these new sources and quantities of data will be able to provide a superior customer experience. However...

Read more
July 12, 2016 | By Carol McDonald

Predicting Loan Credit Risk using Apache Spark Machine Learning Random Forests

In this blog post, I’ll help you get started using Apache Spark’s spark.ml Random forests for classification of bank loan credit risk. Spark’s spark.ml library goal is to provide a set of APIs on top of DataFrames that help users create and tune machine...

Read more
June 29, 2016 | By Carol McDonald

Using Apache Spark SQL to Explore S&P 500, and Oil Stock Prices

This post will use Apache Spark SQL and DataFrames to query, compare and explore S&P 500, Exxon and Anadarko Petroleum Corporation stock prices for the past 5 years. Stocks and oil prices have a tendency to move together over the past decade as explained...

Read more
June 07, 2016 | By Carol McDonald

How Big Data is Reducing Costs and Improving Outcomes in Health Care

The Motivation for Big Data Health care costs are driving the demand for big-data driven Healthcare applications. U.S. health care spending has outpaced GDP growth for the past several decades and exceeds spending in any other developed country. Despite...

Read more
May 03, 2016 | By Carol McDonald

Real Time Credit Card Fraud Detection with Apache Spark and Event Streaming

Editor's Note: Have questions about the topics discussed in this post? Search for answers and post questions in the Converge Community. In this post we are going to discuss building a real time solution for credit card fraud detection. There are 2...

Read more
April 22, 2016 | By Carol McDonald

Real-Time Streaming Data Pipelines with Apache APIs: Kafka, Spark Streaming, and HBase

Many of the systems we want to monitor happen as a stream of events. Examples include event data from web or mobile applications, sensors, or medical devices. Real-time analysis examples include: Website monitoring , Network monitoring Fraud detection...

Read more
March 08, 2016 | By Carol McDonald

How to Get Started Using Apache Spark GraphX with Scala

Editor's Note: Don't miss our new free on-demand training course about how to create data pipeline applications using Apache Spark – learn more here. This post will help you get started using Apache Spark GraphX with Scala on the MapR Sandbox...

Read more
February 22, 2016 | By Carol McDonald

Apache Spark Machine Learning Tutorial

Editor's Note: Don't miss our upcoming Free Code Friday on July 1st. Carol will give an overview of machine learning with Apache Spark’s MLlib, and you’ll also learn how MLlib decision trees can be used to predict flight delays. Register for the...

Read more
December 09, 2015 | By Carol McDonald

High Speed Kafka API Publish Subscribe Streaming Architecture: How it works at the message level

MapR Streams brings integrated publish/subscribe messaging to the MapR Converged Data Platform. In this post, we will give a high-level overview of the components of MapR Streams. Then, we will follow the life of a message from a producer to a consumer...

Read more
November 02, 2015 | By Carol McDonald

MapReduce Design Patterns Implemented in Apache Spark

This blog is a first in a series that discusses some design patterns from the book MapReduce design patterns and shows how these patterns can be implemented in Apache Spark(R). When writing MapReduce or Spark programs, it is useful to think about the...

Read more
September 04, 2015 | By Carol McDonald

Spark Streaming with HBase

This post will help you get started using Apache Spark Streaming with HBase on the MapR Sandbox. Spark Streaming is an extension of the core Spark API that enables continuous data stream processing. Editor’s Note: Download our free E-Book Getting Started...

Read more
August 31, 2015 | By Carol McDonald

Apache Drill Architecture: The Ultimate Guide

In this blog post, we’ll take a look at the inner workings of Apache Drill, learn what services are involved, and find out what happens in Apache Drill when we submit a query. Editor’s note: This post is derived from the DA 410 Apache Drill Essentials...

Read more
August 07, 2015 | By Carol McDonald

An In-Depth Look at the HBase Architecture

In this blog post, I’ll give you an in-depth look at the HBase architecture and its main benefits over NoSQL data store solutions. Be sure and read the first blog post in this series, titled“HBase and MapR-DB: Designed for Distribution, Scale, and Speed...

Read more
August 06, 2015 | By Carol McDonald

Guidelines for HBase Schema Design

In this blog post, I’ll discuss how HBase schema is different from traditional relational schema modeling, and I’ll also provide you with some guidelines for proper HBase schema design. Relational vs. HBase Schemas There is no one-to-one mapping from...

Read more
August 03, 2015 | By Carol McDonald

Parallel and Iterative Processing for Machine Learning Recommendations with Spark

Recommendation systems help narrow your choices to those that best meet your particular needs, and they are among the most popular applications of big data processing. In this post we are going to discuss building a recommendation model from movie ratings...

Read more
June 30, 2015 | By Carol McDonald

Getting Started with the Spark Web UI

Summary This post will help you get started using the Apache Spark Web UI to understand how your Spark application is executing on a Hadoop cluster. The Spark Web UI displays useful information about your application, including: A list of scheduler stages...

Read more
June 26, 2015 | By Carol McDonald

HBase and MapR-DB: Designed for Distribution, Scale, and Speed

Apache HBase is a database that runs on a Hadoop cluster. HBase is not a traditional RDBMS, as it relaxes the ACID (Atomicity, Consistency, Isolation, and Durability) properties of traditional RDBMS systems in order to achieve much greater scalability...

Read more
June 24, 2015 | By Carol McDonald

Using Apache Spark DataFrames for Processing of Tabular Data

This post will help you get started using Apache Spark DataFrames with Scala on the MapR Sandbox. The new Spark DataFrames API is designed to make big data processing on tabular data easier. What is a Spark DataFrame? A Spark DataFrame is a distributed...

Read more
April 09, 2015 | By Carol McDonald

An Inside Look at the Components of a Recommendation Engine

Recommendation engines help narrow your choices to those that best meet your particular needs. In this post, we’re going to take a closer look at how all the different components of a recommendation engine work together. We’re going to use collaborative...

Read more
January 06, 2015 | By Carol McDonald

How to Use SQL, Hadoop, Drill, REST, JSON, NoSQL, and HBase in a Simple REST Client

SQL will become one of the most prolific use cases in the Hadoop ecosystem, according to Forrester Research. Apache Drill is an open source SQL query engine for big data exploration. REST services and clients have emerged as popular technologies on the...

Read more