6 min read
In my previous post, I showed you an Internet of Things demo that we presented at Hadoop Summit which combined IoT stream analysis using MapR Event Store and Spark Streaming. I displayed the results in Grafana, which is a fantastic tool for building analytics dashboards. Since some of the terminology that I threw at you might be a little confusing, I’d like to talk about what was going on in the demo and how it could have a significant impact on the $2 trillion oil and gas industry.
The demos showed a simple time series visualization that tracked the change in temperature of water contained in a glass. Time series charts are so common that we may not even think about them. After all, sensors like barometers, heart rate monitors, and seismographs have been producing time series plots on paper since long before there were computers. But what’s exciting about the visualization in our demo is that each data point is actionable. When combined with data from other sensors, data scientists can write rich algorithms that trigger alerts or take actions based upon changes that happen over time.
Let’s look at an example of how that could change things in the process-intensive oil and gas industry.
Oil and gas wells produce a huge amount of information. Sensors monitor things like temperature, pressure, fluid viscosity, the presence of foreign substances, and seismic activity. Sensors must be monitored in real time to optimize both performance and safety. A slight change in pressure underground may indicate a fracture that can jeopardize the whole well.
Each oil well is unique. The factors that indicate trouble in one well may be nothing to worry about in another. Conventional sensors are good at alerting operators to changes at a given point in time, such as a spike in temperature. However, they lack the analytical foundation to perform calculations on time series data. False alerts are just one of the problems that results.
A better approach is to capture data over time and create alerts or notifications based upon conditions that are unique to each well. For example, a temperature increase over the course of an hour may be of no concern, even if the temperature crosses a certain threshold. On the other hand, a sudden increase may be concerning even if it doesn’t cross a threshold. Conventional sensor-based monitoring can’t take action on these kinds of trends, but a streaming analytics engine can.
You can see that basing decisions on multiple data points can be much more powerful than basing them on thresholds. Things really get interesting when you can combine multiple streams from different sensors. For example, increases in both pressure and temperature over a defined period of time may indicate a bigger problem than a change in either metric measured in isolation.
Energy companies have been able to buy expensive specialized equipment to do time series analysis, but our demo showed that off-the-shelf hardware and open source software change the economics completely. Let’s look at the components we used in our demo.
Off-the-shelf sensors are common and inexpensive. We used a basic digital thermometer in our demo.
Microcontrollers are key hardware components in applications like this. These lightweight devices use mass-produced inexpensive microprocessors that are well-suited to many embedded applications, such as collecting sensor data. Our demo used the Arduino microcontroller, which is fully open sourced. Although it can perform basic computation and filtering, in a typical streaming architecture we use this type inexpensive device to host the lightweight MQTT client.
MQTT is the lightweight messaging protocol we used. It’s an extremely efficient publish-and-subscribe communications vehicle for use in bandwidth-constrained conditions, such as a remote oil field.
MapR Event Store is a reliable software platform for data ingestion, transport, and buffering when used with stream processing frameworks such as Apache Spark, Storm, Apex and Flink. It’s like Apache Kafka, but more resilient.
Spark Streaming and Flink are popular open-source analytics frameworks for performing analytics on streaming data. We used Spark Streaming as the processing engine in our demo.
MapR Database provides persistent storage of streamed data after it’s been processed by streaming analytics engines. MapR Database can be used with the OpenTSDB API to simplify storing and analysing of time series data.
Grafana is an open source data visualization package that we used to display the time series graphics. This is a highly customizable visualizer for creating dashboards that show beautiful charts and graphs of time series information you need. You can also share dashboards with others.
Put all these pieces together and you get an extremely powerful stream processing engine at a tiny fraction of the cost of dedicated monitoring systems. Microcontrollers can be programmed to do some of the processing at the edge, filtering out unneeded data and only passing on the necessary stuff to the network. Operators can get up-to-the-second status reports and create sophisticated algorithms that react to problems without human intervention.
This is only one example of how streaming analytics can change the game for an entire industry. In future posts, we’ll drill down into other powerful examples.