MapR Clarity for Predictive Maintenance with RapidMiner

Download the PDF


With so much buzz around artificial intelligence (AI), machine learning (ML), and advanced analytics, it is sometimes hard to cut through all of the hype to see the practical applications that can help companies today. One area in which these technologies, along with edge computing and IoT, are being successfully applied is by manufacturers seeking increased operational efficiency and productivity through predictive maintenance. According to PriceWaterhouseCoopers, manufacturers’ adoption of AI/ML will increase 38% in the next five years

What is Predictive Maintenance?

The goal of predictive maintenance (PdM) is to eliminate any unplanned shutdowns and minimize the number of needed corrective or preventive maintenance shutdowns by maintaining production equipment in the best possible operating condition to produce manufactured goods to a prescribed quality level. Predictive maintenance accomplishes this by continuously monitoring and evaluating in-service equipment using statistical process control to determine when the equipment approaches conditions that require maintenance. This can be contrasted with timebased or operational count maintenance, where a piece of equipment is taken offline and maintained whether it needs it or not. This can result in unnecessary downtime for the entire production line and increased costs for unneeded maintenance.

The promise of PdM to maximize the production value of commercial equipment while minimizing both downtime and the ost of preventative maintenance is becoming a reality thanks to the industrial internet of things (IIoT) and a confluence of forces coming together – the growth of devices that collect diagnostic data about themselves, highly parallelized systems to process massive amounts of data, machine learning to process and analyze data, and networks that can absorb and distribute the data for analysis.

PdM systems can provide value in the following common scenarios:

  • Detecting when machines exhibit characteristics associated with past failures
  • Predicting failures preemptively, but efficiently, so equipment can be maintained to optimize uptime
  • Identifying factors that lead to premature failure conditions, so they can be eliminated or minimized, therefore lengthening the life of the equipment
  • Avoiding unnecessary maintenance to minimize costly maintenance, and lost productivity due to unnecessary planned downtime

Using a Data Platform to Power Predictive Maintenance

As manufacturers look to benefit from the tremendous stream of real-time diagnostics from more and more equipment, they struggle to build an infrastructure that can collect and store the data at full fidelity, correlate that data across operations, and analyze it for anomalies. They also often have to deal with legacy systems that have data trapped in silos, both at remote edge sites and in various systems throughout the corporate infrastructure, that they will want to integrate into the central repository to aid in the predictive algorithms.

When building the infrastructure to effectively and efficiently collect and store the data, they need to make the following considerations:

  • Rate and volume of the data being accumulated
  • Cost of data store
  • Need for archival for regulatory compliance purposes
  • Data privacy and security
  • Ability to correlate data across different sources

To give you an appreciation for the amount of data that PdM inference models need to digest, consider this:

  • Volume and velocity of data: Manufacturing processes can be instrumented by devices that measure hundreds of metrics at speeds that sometimes exceed thousands of samples per second (e.g., vibration sensors).
  • Accurately predicting failure events: These events are typically infrequent (e.g., once per month) and could occur due to a variety of reasons.
  • Focus on better training: ML can only predict events that can be generalized from training data. If events are rare, data collection must be that much longer. A good rule of thumb is to train models with datasets that span several hundred events.

This means that the infrastructure that handles this data must be able to scale in both throughput and storage volume. It also must be able to have the capabilities to allow the data to be processed and analyzed in real time at scale.

Once the data is being captured from the machinery and correlated with other data streams, it is time to analyze the data to detect conditions that require maintenance. PdM can be implemented using supervised machine learning by building a model to predict labels based on how those labels are mapped to features in the training data provided to the model. The two most common labels used for PdM are:

  • The possibility of failure in next n-steps (e.g., “About to Fail”)
  • The time (or machine cycles) left before the next failure (e.g., “Remaining Useful Life”)

In order for data scientists to create models and extract features that can successfully predict conditions that require maintenance, they need a data platform that works with the latest AI and ML tools and gives those tools unfettered access to the data.

The MapR Data Platform with RapidMiner is a Perfect Fit for Predictive Maintenance

The MapR Data Platform helps customers by consolidating all operational data into a single fabric, overcoming pain points around insufficient or siloed data, allowing machine learning to optimize all aspects of their operations. MapR offers the only fully integrated data platform for collecting, storing, distributing, processing, and performing machine learning on IIoT data.

RapidMiner delivers on the promise of accelerating data preparation and developing models quickly for data scientists, using a visual workflow designer.

Ultimately, MapR and RapidMiner deliver improved utilization and efficiency of assets as well as increased production efficiency, leading to higher quality results.

About MapR

MapR Technologies, provider of the industry’s next-generation data platform for AI and analytics, enables enterprises to inject analytics into their business processes to increase revenue, reduce costs, and mitigate risks. MapR addresses the data complexities of high-scale and mission critical distributed processing from the cloud to the edge, IoT analytics, and container persistence. Global 2000 enterprises trust the MapR Data Platform to help them solve their most complex AI and analytics challenges. Amazon, Cisco, Google, Microsoft, SAP, and other leading businesses are all part of the MapR ecosystem. For more information, visit

About RapidMiner

RapidMiner brings artificial intelligence to the enterprise through an open and extensible data science platform. Built for analytics teams, RapidMiner unifies the entire data science lifecycle from data prep to machine learning to predictive model deployment. RapidMiner is a global leader in the application of machine learning in manufacturing and participates on the steering committee of Industry 4.0 in Germany. For more information, visit

Datasheet PDF

MapR Clarity for Predictive Maintenance with RapidMiner