Contributor: Mathieu Dumoulin

MapR Converge Blog author, Mathieu Dumoulin

Mathieu is a Data Engineer on the MapR Professional Services team, and is based in the Asia-Pacific region. He started using Hadoop in 2012 at the Fujitsu Canada Innovation Lab, where he built a large-scale text classification system from scratch. Since then, Mathieu split his time between being a Search Engineer and managing a new Data Science team for a large Japanese HR company. His current interests are focused on Apache Drill, Apache Spark, and Deep Learning. Mathieu holds both a B.A.Sc. in Computer Science and a Master of Computer Science degree from the Université Laval in Canada.

Blog Posts by Mathieu Dumoulin

August 02, 2017 | By Mathieu Dumoulin

How to Set Up MapR-DB to Elasticsearch Replication

Use Case for Elasticsearch Replication Automatic replication of MapR-DB data to Elasticsearch is useful for many environments. There are some great uses cases I can think of for taking advantage of this great feature. Full text search of data in MapR...

Read more
July 31, 2017 | By Mathieu Dumoulin

Real-Time Anomaly Detection Streaming Microservices with H2O and MapR – Part 3: Production Deployment of ML Model With a Streaming Microservice

Editor’s Note: This is Part Three of a three-blog series. Read: Part 1: Architecture & Part 2: Modeling Introduction In this blog series, we cover the architecture of a real-time predictive maintenance system. It’s a more detailed version of our Strata...

Read more
July 24, 2017 | By Mathieu Dumoulin

Real-Time Anomaly Detection Streaming Microservices with H2O and MapR – Part 2: Modeling

Editor’s Note: Read Part 1: Architecture and Part 3: Production Deployment Introduction Predictive maintenance is a great use case for machine learning, where data collected from sensors applied to industrial equipment is analyzed for early warning signs...

Read more
July 17, 2017 | By Mathieu Dumoulin

Real-Time Anomaly Detection Streaming Microservices with H2O and MapR – Part 1: Architecture

Editor’s Note: Read Part 2: Modeling and Part 3: Production Deployment Converged Architecture for Real-time Anomaly Detection for IoT Sensor Data Industry 4.0 IoT applications promise vast gains in productivity from reduced downtime, higher product quality...

Read more
July 07, 2017 | By Mathieu Dumoulin

Configure Jupyter Notebook for Spark 2.1.0 and Python

I'll guess that many people reading this have spend time wrestling with configuration to get Python and Spark to play nicely. Having gone through the process myself, I've documented my steps and share the knowledge, hoping it will save some time...

Read more
May 31, 2017 | By Mathieu Dumoulin

Performance Tuning of an Apache Kafka/Spark Streaming System - Telecom Case Study

Real-world case study in the telecom industry Editor's Note: The telecommunications industry is on the verge of a major transformation through the use of advanced analytics and big data technologies like the MapR Converged Data Platform. The "...

Read more
April 04, 2017 | By Mathieu Dumoulin

Kafka REST Proxy - Performance Tuning for MapR Streams

MapR Streams is a “Kafka-esque” message streaming system which, similarly to Apache Kafka, provides very high throughput performance combined with low message latency and high reliability. Unique to MapR Streams, however, is a broker-less design that...

Read more
January 17, 2017 | By Mathieu Dumoulin

Performance Tuning of an Apache Kafka/Spark Streaming System

Real-world case study in the telecom industry Debugging a real-life distributed application can be a pretty daunting task. Most common Google searches don't turn out to be very useful, at least at first. In this blog post, I will give a fairly detailed...

Read more
January 10, 2017 | By Mathieu Dumoulin

Real-time Smart City Traffic Monitoring Using Microservices-based Streaming Architecture (Part 2)

Modern Open Source Complex Event Processing For IoT This series of blog posts details my findings as I bring to production a fully modern take on Complex Event Processing, or CEP for short. In many applications, ranging from financials to retail and IoT...

Read more
January 09, 2017 | By Mathieu Dumoulin

Better Complex Event Processing at Scale Using a Microservices-based Streaming Architecture (Part 1)

A microservice-based streaming architecture combined with an open source rule engine makes real-time business rules easy This post is intended as a detailed account of a project I have made to integrate an OSS business rules engine with a modern stream...

Read more
November 07, 2016 | By Mathieu Dumoulin

How to Set Up MapR-DB to Elasticsearch Replication

The Use Case Automatic replication of MapR-DB data to Elasticsearch is useful for many environments, and I want to share information about a specific customer deployment I worked on recently. Their use case is related to log security analytics and is...

Read more
May 17, 2016 | By Mathieu Dumoulin

Monitoring a MapR Cluster with Elasticsearch + Kibana

The MapR Converged Data Platform offers a unified API for all aspects of solving real, mission-critical data problems that enterprises have to deal with today. In this blog post, I would like to share another, much less talked about advantage that emerges...

Read more
April 29, 2016 | By Mathieu Dumoulin

CLDB Monitoring Using JMX as a Modern Alternative to Ganglia

There are many options for monitoring the performance and health of a MapR cluster. In this post, I will present the lesser-known method for monitoring the CLDB using the Java Management Extensions (JMX). According to one of the most highly regarded MapR...

Read more
April 27, 2016 | By Mathieu Dumoulin

Distributed Deep Learning with Caffe Using a MapR Cluster

We have experimented with CaffeOnSpark on a 5 node MapR 5.1 cluster running Spark 1.5.2 and will share our experience, difficulties, and solutions on this blog post. Deep Learning and Caffe Deep learning is getting a lot of attention recently, with AlphaGo...

Read more