A Software Architect's Guide to Building Modern Factory Apps

Contributed by

9 min read

In his most recent post, my colleague Ronak Chokshi described the evolution of use cases in manufacturing and how the rise of IoT technology has spurred a wave in new investment and innovation. In this post, I'm going to dive deeper into some technical considerations that software architects should weigh when designing infrastructure that can drive improvements in product quality, manufacturing line efficiency, and supply chain accuracy.

Future Proof by Collecting More Data Than You Think You Need

Most manufacturing infrastructure modernization projects start with a simple goal: open up existing plant data to modern tools, such as business intelligence tools for better reporting, or data science tools, like R or Python for building predictive models. To achieve these goals, data collection can be as simple as querying the historian for already-collected data points and putting that data into an open data format that new tools can access. While this may indeed be a fine way to kick-start data collection, it is dangerous to be too short-sighted and design application infrastructure around this initial need.

The use cases that drive the most value in manufacturing are those that rely on artificial intelligence and machine learning to predict failures of equipment, detect quality issues in real time, or pinpoint points of sub-optimal efficiency. These algorithms require a frequency of data that historians are unable to collect. For example, Tupras, a major oil refinery in Turkey, struggled with their historian's 60s collection interval from their 200k sensors and designed an infrastructure that could handle collecting data each second to support their new use cases. Collecting data more at a shorter time interval means augmenting existing data pipelines that support the historian systems and building new collection mechanisms that interact directly with data sources, like sensors, PLCs, and DCSs. Depending on the type, this could be as simple as interfacing with an existing protocol, like OPC, SCADA, or MQTT, or writing a custom integration.

In addition to collecting more frequently from existing sources, often it's necessary to add new types of data sources to the factory to support emerging use cases. I wrote an article recently on how video is rapidly gaining popularity for IoT use cases, due to decreases in cost and advancements in image recognition algorithms.

Design for Real-Time Action by Streaming First

The first decision to make when bringing in a data source is whether to write to a bulk object, like a file or database table, or send individual events into a stream. It may be tempting to choose based on an initial use case (for example, writing to files to make it easier to access from R), but this is dangerous for the same reason as siphoning off of the historian is—you may be limiting your options moving forward. Consider this: streaming data can always be written to files, but file data can't be written real-time to a stream. That means that starting with file-based ingestion means giving up future real-time use cases.

A common design pattern for building multiple applications off of a single source of data is called "stream of record." The basic idea is to take source data and push it event-by-event into a stream where the data is held indefinitely. Each new real-time application that comes online simply subscribes to this stream. Each new batch or file-oriented application can be supported by building a lightweight process that pulls data out of the stream and writes it to files that are optimized for the application. The best part of this approach is that you can arbitrarily add new applications with different requirements or change your mind on an existing application. A simple example of this technique being used to collect pressure data from instruments in a factory is pictured below.


Use Bandwidth Wisely by "Acting Locally"

Congratulations! Based on the suggestions above, you've built a streaming pipeline that integrates data from 50k sensors every second, plus 2 HD video streams! Assuming 10B per sensor reading, and 5Mbps per video stream, you're now ingesting 14Mbps worth of data. If you're lucky, your factory is in a metropolitan area, and you have a 10Mbps internet connection; if you're not, you may be hanging off of a slow, flaky, high-latency satellite connection. Either way, you're not going to be able to pipe everything to the cloud.

Fear not, the majority of the heavy lifting needed to build your AI-assisted factory applications can happen at the "edge," or the factory premises, by using a technique called "think globally, act locally." This technique is based on the realization that training a machine learning model and using that model are two very different actions that can occur in different locations.

Machine learning model training works best under two conditions:

  1. Availability of lots of compute
  2. Availability of lots of data

As such, the place where both of these conditions are typically met is in the cloud, or a centralized corporate data center. This is where data would have been aggregated from multiple distributed locations and joined with public or existing data, and compute can be summoned with an API call.

On the other hand, models are best deployed close to where the action is—at the edge. This allows insights and actions to be derived as close to event-time as possible and significantly cuts down on data that needs to leave the factory. That isn't to say that no data is leaving the factory—you need new data to train models on in the cloud, but you can strip out the uninteresting data. The diagram below illustrates the relationship between the applications sitting at the cloud and edge.


Achieve Development Agility by Containerizing

Once you have your factory IoT applications written, the next hurdle is figuring out the operational pieces—how to get the applications deployed, versioned, and secured. Here, there are several challenges to overcome:

  • Dealing with dependency issues on (potentially) specialized hardware at the edge. Factory computing systems may be running specialized OSS or software stacks to support existing applications, making it hard to predict how a new application might run.
  • Deploying new versions of an IoT application in production without disrupting the rest of the infrastructure.
  • Securing IoT applications by preventing users with OS-level access from tampering with application configurations or binaries.

Containerizing applications solves all three of these challenges:

  • Containers ship with dependencies built in, avoiding dependency issues.
  • Versioning containerized applications is simple, as the new version can be launched from an image in parallel with the existing version and verified before being put into production.
  • The container boundary creates a wrapper around application configurations and binaries, making them more secure.

Putting It Together with MapR

Only MapR provides a comprehensive platform for factory IoT applications.

  • MapR has scale-out architecture at both the core and edge, which enables companies to collect as much data as is needed to support new use cases.
  • MapR Event Stores offers the industry's only publish/subscribe streaming system that supports "stream of record" architectures and global IoT-scale replication.
  • MapR Edge is the industry's only converged data platform that runs at the edge, allowing a full suite of necessary data services—file system, database, and streaming—to run on a small footprint of shared hardware, enabling IoT applications to take action on data in the moment.
  • MapR provides comprehensive support for Docker containers, including the Persistent Application Client Container (PACC) and storage plugin for Kubernetes, which allows companies to deploy applications in containers with storage backed by MapR, fully preserving application resiliency and availability.

Additional Resources:

This blog post was published April 23, 2018.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now