12 min read
Predictive maintenance is a great use case for machine learning, where data collected from sensors applied to industrial equipment is analyzed for early warning signs of impending failure or suboptimal operation.
There is great business value in a system that can reliably detect equipment failure. A plant could increase usage of expensive machinery due to less time spent on unnecessary manual inspections and by being able to schedule maintenance before breakdowns occur. In addition, well maintained machines operating optimally require less energy (electricity, fuel, etc.) and lead to higher average quality of output.
One way to implement such predictive maintenance using machine learning is to use an anomaly detection approach.
To demonstrate a working real-time anomaly detection pipeline, we built a model industrial robot arm and programmed it to perform a task repetitively. The arm picks up a container and moves it to another spot repeatedly. A wireless sensor, made by LP-RESEARCH, is fixed to the robot arm as it performs its task.
In addition, we programmed a failure mode that simulates actuator issues where the arm staggers instead of operating smoothly. This is the signal we want to detect.
Figure 1: Model industrial robot (in red)
The robot is a realistic industrial robot analog and the failure mode is a close match to a common type of equipment failure, based on the extensive industrial robotics experience of the engineers at LP-RESEARCH.
We’re starting the project without any data, a common situation for new enterprise ML projects. This is an easy problem to fix; we can gather data by running the robot on a loop overnight.
However, we also don’t have any a priori knowledge of what pre-failure data looks like. If we did know, we could just code an algorithm to detect this known pattern and we’d be good to go without needing to resort to machine learning.
Machine learning is the best tool to solve this kind of problem. Instead of coding an algorithm looking for a known pattern, we can train a model on normal operation data then raise an alarm when we find a sequence of data that strays too far from the norm. This approach is called anomaly detection, a type of unsupervised machine learning.
Thus we can reduce our problem to a real-time anomaly detection system, i.e. anomaly detection on time series data.
Figure 2: Anomaly detection of time series data
In Figure 2, we have an idea of the kind of pattern we are looking for. The important point here is that a priori, we have no idea what the anomalous pattern is going to look like.
The beauty of machine learning is that the algorithms are looking for whatever is different. I don’t need to tell it what to look for ahead of time. Therefore, we can think of the output of the model as an educated guess based on all the data it saw when it was trained.
The most important initial task of any enterprise machine learning project is not to collect data. It is rather to sit down with the target business user and decide what the system performance needs to be for the project to make sense and provide value.
This is an important collaborative task where the data scientist must give understandable guidance to the business user about what’s possible, what’s easy, and what’s difficult.
In our case, our industrial robot model is programmed with a normal mode and a failure mode. We expect the system to predict the mode correctly within a few seconds.
In a real production project, we’d also agree on some minimal thresholds for precision and recall and measure the production performance (actual failures versus predicted) to track its performance and adjust the model accordingly.
We discussed our problem in Part 1 as well as the architecture of our solution both for model training and model production deployment. In this post, we discuss in more detail the modeling task, which was done entirely using the H2O AI platform running on a MapR Converged Data Platform running on a cloud service.
We chose H2O for this problem for the following reasons:
Let’s review all the steps that led us from raw data to a working well-performing model that meets our initial performance target.
Figure 3: Raw data sample
The raw data produced by the sensor is shown in Figure 1. We see that the sensor is able to collect a vast amount of different measurements. For this project, we focused only on linear acceleration in the X, Y, and Z axis, so all other measurements are left at zero for the data we collect.
In a production system, we would want to capture the data from all measurements. At a future time, data scientists could analyze the data in different ways and develop better performing models over time. Storage cost is sufficiently low in modern distributed platforms that keeping raw data is considered a standard best practice.
These measurements are centered at 0 and typically vary between -1 and 1 given the movement data generated by the robot arm’s movements.
Figure 4: Signal amplitude as a time series
Here we can see a simple plot of the linear acceleration in the X axis as a time series in signal amplitude versus time in milliseconds. As we can see, the data is very noisy. Always expect real-world data to be noisy and initially non-obvious.
Figure 5: Autoencoder Neural Net (http://philipperemy.github.io/anomaly-detection)
The algorithm used by H2O for anomaly detection is called autoencoder. Put simply, an autoencoder is an algorithm that uses neural networks to compress data into a simpler form, then decompresses it into a reconstructed version of the original data. In more mathematical terms, it’s an approximation of the identity function: ? -> ?' .
This is a useful feature for anomaly detection because we’re training the autoencoder on normal data. Thus, it has a very good approximation of the identity function for normal data. In other words, normal data as input leads to an output with a small reconstruction error. When we use unusual input data, the output likely will be all wrong, as the model is only good at reconstructing normal data. This leads to a larger – hopefully detectable – output error.
For a more detailed explanation about autoencoders for anomaly detection, please check out this blog by Philippe Remy. He keeps a good balance between practical reasoning and showing the theoretical proof with some mathematics.
Our solution was done using H2O’s R API. We decided to use R because it’s very easy to get started and focus on the data science problem without getting caught in coding details. A great advantage of H2O is that it will run just as fast regardless of what API is used to train its models, as everything ends up executed by its own internal highly optimized java engine.
The main point about our solution is that it uses a sliding window over the code to use three observations at a time as input. From the window, n-grams are generated, which shows the algorithm the time dependency from one observation to the next.
Next is that the model output isn’t used as-is. In a standard approach in anomaly detection, we take the output and add a threshold of N times the standard deviation. If the output is above this threshold, we’ll count it as an anomaly. The choice of the threshold is done to minimize false positives while still leaving room to detect a useful number of anomalies.
Build n-grams from sliding window.
Read input data and send it to H2O.
Use the deep learning algorithm in autoencoder mode to train the model.
neurons <- 50 iot.dl = h2o.deeplearning(model_id = "iot_dl", x = 1:(ncol(iot)), training_frame = iot.hex, autoencoder = TRUE, hidden = c(neurons), epochs = 100, l1 = 1e-5, l2 = 1e-5, max_w2 = 10, activation = "TanhWithDropout", initial_weight_distribution = "UniformAdaptive", adaptive_rate = TRUE)
Make predictions on the training data, where we define, empirically, that an anomaly is when the error output is more than double the standard deviation.
Test the model, make some validation plots.
Export the final model to POJO to be used for model deployment to production.
For full code listing on Github, get it here.
Figure 6: X axis prediction results
The plots tell the story. Even with noisy data, we could identify the anomalous signal correctly. What’s interesting here is that it is not a simple question of the raw signal’s amplitude.
The robot is vibrating all the time during normal operation. Transitions from one direction (up-down) to another (left-right) generate even more noisy spikes. Those are all normal and should not be labeled as failure.
What helps us is that we aren’t limited to the X axis. Rather, our model is trained on the combined data of the X, Y, and Z axis. When all three axis are considered together, the anomalous signal becomes easier to pick out and the model is able to consistently distinguish between the normal mode and the failure mode within a second or two.
Figure 7: X, Y, and Z axis data combined
Our best results were empirically obtained with a time window of 200ms and a threshold of one standard deviation.
We’re going to look at the code we used to score fresh raw data using the model we just built. The scoring needs to be in real-time and sent to some output visualization or another system that would make use of the predictions.
We look at how we did this in the third and final installment. See you there!
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.