Join the Webinar
: Deep Learning & AI on Oracle Cloud Infrastructure

Starts in:

Real-Time Anomaly Detection Streaming Microservices with H2O and MapR – Part 3: Production Deployment of ML Model With a Streaming Microservice

Contributed by

11 min read

Editor’s Note: This is Part Three of a three-blog series. Read: Part 1: Architecture & Part 2: Modeling

Introduction

In this blog series, we cover the architecture of a real-time predictive maintenance system. It’s a more detailed version of our Strata Beijing (July 2017) talk on this same topic.

In Part 1: Architecture, we cover the use case and general architecture of our solution. In Part 2: Modeling, we cover the modeling with H2O and show the code used to train the final model. In this blog, We cover the production deployment.

Our System

System

Let’s briefly review the system, which we explain in greater detail in Part 1: Architecture. Our working prototype does real-time anomaly detection from the small blue wireless sensor attached to the model industrial robot (in red, above). In addition, we use an augmented reality (AR) headset prototype for output visualization, overlaid on the black and white square located on the arm of the robot.

Using an AR headset allows a factory operator to put on the helmet and see the operational status of the factory and its various machines in real time. The operator can check the status of the various production machinery not just from an abstract dashboard sitting in an office but while physically walking around on the factory floor.

States

Deployment of Machine Learning to Production

There is surprisingly little information about how to deploy that shiny new machine learning model to production.

In our use case, we’re trying to show the status of the machine, “OK” or “FAILURE” in real-time to a visualization system.

In this case, what is meant by a visualization system is a display that allows the operator to see, in real time, the status of machines equipped with the sensor. This could be an operational dashboard made with Grafana or Kibana or, in our case, our AR helmet.

Deployment Patterns

Data Scientists Use R and/or Python

Integrating a trained model into a production system is a real challenge for the practical application of machine learning features in the enterprise.

Data scientists usually prefer to work with R or Python to develop their models in the first place, but Java is very strong in the enterprise market. While there are many solutions around this problem, H2O has one of the better ones on offer: Export to POJO.

Easy Mode: H2O Export to POJO

H2O comes with a very polished the highly useful “export to POJO” feature.

From H2O documentation: “One of value add of using H2O is the ability to code in any front end API but have the model exportable as a POJO (Plain Old Java Object). This allows the user the flexibility to take the model outside of H2O either to run standalone or integrating the Java Object into a platform like Hadoop’s Storm. The walkthrough below will detail the steps required via the command line to export a model object and use it to score using a sample class object.”

Figure 3. Export to POJO from Flow UI

Figure 3: Export to POJO from Flow UI

The export to POJO function can be used directly for the API, but is most easily used from the Flow UI as shown in Figure 1. You get a normal Java class as a text file that can be immediately copied into your favorite development environment.

Our Solution

Streaming Architecture

Streaming Architecture ebook

In a nutshell, our production deployment is a straightforward application of the streaming architecture pattern as best described by Ted Dunning and Ellen Friedman in their book Streaming Architecture (complimentary download). In this approach, we rely on the capabilities of a Kafka-esque stream messaging platform for data input and output.

A stateless microservice runs on any server (or several to a single server) to connect the input to the output and applies some processing to the data.

Real-Time Predictions Using Streaming Microservice Architecture

Figure 4: Real-time predictions

Figure 4: Real-time predictions

Our system leverages the converged capabilities of the MapR platform as well as the Kafka REST proxy to simplify the implementation as much as possible.

Raw sensor data is produced onto the input stream via the Kafka REST Proxy, saving a good bit of cluster-side coding.

The raw sensor data is then scored using a real-time, stateless, streaming microservice. This approach has the advantage of scaling the most easily. Just launch it on a bunch of servers and it will start reading data from the input stream, all reading from the same consumer group so that the same data is never processed more than once.

On the output side, predicted state (Failure or OK) is read from the output stream via the Kafka REST proxy again. These predictions can be used for any number of useful output systems such as an operational dashboard. Our prototype uses microservice implementation.

Microservice Implementation

The prediction microservice is extremely simple:

  1. Consumes messages from the input stream using the Kafka Java API
  2. Does simple ETL to format the raw data into the format expected by the model
  3. Queries the model with the data to get the predicted score
  4. If the score is above one standard deviation from the mean from normal operation, predict failure, otherwise, predict OK
  5. Produce the prediction to the output stream using the Kafka Java API

All this can be done with about one page of Scala code and another page of little facades over the Kafka 0.9 Java API to make it even easier to consume and produce data from the streams.

Making Predictions and How to Avoid the Output “Flickering”

Core Part of Production Code

Above is the core part of the production code, where data is read from the input stream and scored using the H2O model.

One problem we had was to avoid the predictions “flickering” between OK and Failure state, as that was very confusing and likely due to noise. To resolve that issue, we accumulate predictions in a window of 200 sensor reading events and calculate the RMS error of our predictions inside that window of events. If the RMSE is big enough, we output Failure.

The full implementation code is available on Mateusz’s GitHub. Please feel free to check it out!

Kafka Configuration and Tuning

We didn’t change anything for the Kafka REST Proxy. The default settings are fine for production use in general.

We stuck to basics with the configuration of the Kafka streams. We use one topic for data input and one for data output. Both use the defaults settings, which are appropriate for general production use. Partitions are set to 30 for each, 10 per node, which is MapR’s best practice recommendation.

Additional tuning should only be done based on the needs of the application given the business use case. For this system, we aim for a 3 seconds or less round trip from raw data to output in the visualization system.

Not needing to do any special configuration or tuning to get decent performance for our system was a big win from choosing to use the MapR platform to run our system.

Conclusion

Scaling Production

As the number of sensors increases, thus increasing the data throughput, it can be worthwhile to increase the number of partitions to ensure that the stream makes optimal use of all nodes of the cluster. The value win of good performance is to use the smallest cluster possible to meet the system SLA. This is a good reminder that the benefits of good performance aren’t limited to "it goes really fast."

Our project uses an augmented reality headset to visualize the output of our anomaly detection model. LR RESEARCH had this headset prototype at a point in development where it was technically possible to use it.

A major advantage of using a streaming architecture approach is that the output can be easily switched or several visualization systems could be used concurrently, all backed by the scalability that comes with the MapR Converged Data Platform.

We can run many prediction microservices in parallel to process more data. Each one can be flexibly configured to use a different model (or stay with the same one) to adapt for different robots and machines.

We could use more than one stream topic to gather data from multiple sensors at the same time as well, a potentially fruitful improvement that could be rolled out at a future time with minimal production impact.

MapR Converged Data Platform for Operational Workloads

The MapR platform saved us a lot of work when we implemented our system. As soon as the platform itself was installed on a few AWS nodes, we could already start using it for pushing data in and out of streams. There is no need to install separate clusters for Apache Kafka and a YARN cluster for Storage, Spark, and H2O.

The system took four engineers working part time less than two months to build, with most of the work going into getting the AR headset working. A simple Kibana or Grafana dashboard could have gotten the system done in less than a month.

A few global manufacturing leaders are already in production with very similar systems to ours, at least in terms of capabilities. These systems are built using ultra-expensive custom technology and require the collaboration of a large team over a period of a year or more.

Our system shows that we can get a comparable system up and running using well understood enterprise big data open source software in a few months with a small team. This has profound implications in terms of democratizing access to this cutting edge, useful technology to any mid-size or large manufacturer.

Industry 4.0 is no more the exclusive playground of a few large global manufacturers.

Additional Resources


This blog post was published July 31, 2017.
Categories

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.


Get our latest posts in your inbox

Subscribe Now