Dynamic Scaling for Computer Vision with Pub/Sub Messaging and Docker

Contributed by

5 min read

Face detection is providing new quantitative paths for a wide range of applications, such as smart office spaces, ad analytics, and surveillance. However, implementing face detection on video is challenging on two fronts:

  1. Video sources generate high throughput data streams, making them difficult to transport and buffer.
  2. Real-time face detection requires fast GPU hardware, whose costs limit the practicality of processing multiple video sources.

Many applications need to simultaneously process multiple videos (e.g., webcam feeds) at near real-time frame rates. Until recently, the assumption has been that a dedicated GPU must be allocated for each video feed. With the advent of containerization, distributed pub/sub messaging services, and elastic GPUs in the cloud, we can architect applications that more efficiently utilize hardware to process multiple high-speed video feeds.

In this blog, I'll describe an architecture that's suitable for detecting faces in multiple video feeds with an efficient use of GPU hardware. This architecture is generally applicable to any application that processes fast data streams and requires the ability to scale by adding more machines without changing the application code.

Architecture Overview

The two components of MapR used for this face detection application are MapR Event Streams and MapR PACC:

  1. MapR Event Streams (MapR Event Store) makes it easier to distribute fast data streams (such as video feeds) to a group of stream consumers. It simplifies the buffering and connection management that's often complicated when you have a dynamic number of concurrent stream consumers. MapR Event Store also allows you to partition streams so you can split up time consuming tasks, like face detection across multiple consumers, without redundant processing on duplicated data.
  2. MapR client containers (MapR PACC) for Docker enable workloads to elastically scale in the cloud. This makes it possible to utilize hardware more efficiently and meet SLAs more effectively.

Why Distributed Pub/Sub?

Distributed pub/sub messaging services are typically used for communication between services. When messages are produced faster than they are consumed, these services store the unconsumed messages in memory. As such, they provide a convenient decoupling that allows producers to send data to consumers without the burden of maintaining connections and managing back pressure across a variable number of consumers.

Can Pub/Sub Streaming Really Handle Video?

Pub/sub streaming systems, like Kafka and MapR Event Streams (MapR Event Store), have traditionally been thought of as a solution for inter-service communication rather than a distribution mechanism for fast data streams, such as video. However, we've proven that video can be distributed through MapR Event Store. In doing so, we see two distinct advantages:

  1. MapR Event Store buffers data between fast producers and slower consumers.
  2. MapR Event Store simplifies the processes of broadcasting fast data across a group of concurrent stream processors.

Why GPUs?

Real-time face detection requires high speed processors. Video processing on traditional CPUs can only be done by significantly compromising the fidelity and timeliness of video frame processing. For example, it takes more than 10 seconds to classify a single image on an Intel Xeon Skylake CPU in a standard Google Cloud virtual machine. That same process takes less than 1 second if an Nvidia K80 GPU is used.

https://raw.githubusercontent.com/mapr-demos/mapr-streams-mxnet-face/master/images/cpu_vs_gpu.png

Why Docker?

Docker makes it easy to orchestrate video producers and consumers. When they're packaged with the MapR Persistent Application Client Container (MapR PACC), they can maintain access to a MapR cluster and the video streams hosted on that cluster, even though the Docker containers themselves are ephemeral.

By dockerizing the video consumers, face detection processes can be provisioned on the fly, when new video feeds demand faster processing. Discovering those streams will never be a problem, since MapR's global namespace ensures stream locations never change.

Resource discovery is often a problem in distributed architectures where processes run on different machines, but that's not a problem with MapR because the MapR global namespace eliminates any confusion about where files, tables, or streams are located across clusters.

Demo Code

Our face detection demo application was implemented in Python using APIs for Tensorflow and MapR Event Store. The code and documentation is provided here:
https://github.com/mapr-demos/mapr-streams-mxnet-face.

The functional flow for the various components in this application are shown below:

Demo Video

Watch a video demonstration of this application here:

Related Links:


This blog post was published August 03, 2018.
Categories

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.


Get our latest posts in your inbox

Subscribe Now