Kubernetes and Containers with MapR (Part 4 of 4)

Contributed by

9 min read

Based on MapR Academy course, BUS – Application Containers and Kubernetes

This fourth and final blog post in the Containers and Kubernetes series discusses how Kubernetes and Containers work on MapR. For the previous posts, please read part 1, part 2, and part 3 here.

You’re almost done with your introduction to using application containers and Kubernetes. The last part of your journey is to see how to put this all together in a real-world, enterprise environment.

Application Container Example

Throughout this Kubernetes blog series, we’ve used a fictional health care app to demonstrate how application containers and Kubernetes can be used in an enterprise environment. Let’s now take a look at a couple of different ways you can use the tools in the MapR Data Platform to deploy and manage such an app on a single platform and take advantage of unique features available with MapR.

MapR Solution: On-Premises

The scenario in this blog series involves streaming data to a health provider cluster for processing. In this example, the solution means processing the IoT data in an on-premises cluster.

Stream Sensor Data to MapR Cluster

The data is created by a wearable device. An app on that device uses the MapR Event Store service to stream the data to on-premises storage at the customer’s health care system.

Store Data in POSIX-Compliant Distributed File System

The IoT data is streamed directly into the MapR XD POSIX-compliant distributed file and object store and saved in their native JSON, CSV, or Parquet format.

Process Data Natively on MapR

Using both files saved in MapR-XD and new live streaming, we can process the data with Spark to compare live information to legacy data saved in the system.

In addition, MapR-XD allows for the different sources of data to be tiered, based on level of access, giving faster access to data that is used more frequently.

Kubernetes can spin up containerized machine learning apps on MapR to analyze the data natively as it streams in, all on the same cluster, saving the time of transforming the data before it is processed. In addition, MapR can scale the compute and storage independently. If more data is coming in, you can add more application containers on MapR to support the increased demand.

MapR Solution: Cloud

Alternatively, MapR provides an all cloud-based solution to the wearable health care app. In this solution, we containerize the MapR Data Platform and move it to the data, rather than bringing the data to the MapR cluster. This is vastly more efficient, as data can be processed where it is created, saving time and resources needed to move the data to the cluster.

Stream Data to the Cloud

Wearable IoT devices create data and send it to the cloud, using the MapR Event Store for Apache Kafka services. A cloud provider close to the device reduces data transfer time.

Deploy Containerized MapR on Cloud

The MapR Data Platform is spun up in a container in the same cloud environment. The MapR Data Platform is broken into microservices, and the cluster can be spun up in just seconds. The MapR Event Store ingests the data into MapR-XD on the cloud.

As clients in other areas create data on their wearable devices, MapR Event Store sends that data to cloud platforms hosted nearby. Containerized MapR clusters are spun up in those cloud environments as well, ingesting the data into MapR-XD.

Process Natively in the Cloud

Spark processes the data close to where it was created, greatly reducing time and resources needed to move the data.

Global Namespace Views Content as a Single Source

The MapR Global Namespace allows all of this data to be processed at its local center, but viewed together as though they are coming from a single cluster, without the need to move any data between clouds or to on-premises storage.

MapR Data Platform Components

The following components of the MapR Data Platform can be used to make our health care app example functional, using application containers and Kubernetes.

  • MapR Data Platform
  • MapR Event Store for Apache Kafka
  • MapR-XD Distributed File and Object Store
  • Cloud Integration
  • Global Namespace
  • Live Data Processing

MapR Data Platform

The MapR Data Platform is a single, complete platform that supports enterprise-level data storage and processing needs.

Just as containers provide a self-contained environment for an app to run as efficiently as possible, the MapR Data Platform provides a single environment for streaming, ingesting, processing, and storing data from the edge, IoT devices, the cloud, on-premises, or any combination of data sources and types.

Just as Kubernetes handles all of the orchestration and maintenance of application containers, the MapR Data Platform handles all of the orchestration, distribution, scale, connectivity, replication, security, and high availability of your data and processing. MapR will take care of the maintenance, and your team can focus on the results.

MapR Event Store for Apache Kafka

MapR Event Store for Apache Kafka provides a platform for streaming data live from IoT devices. With MapR, you can stream data at an enterprise level, easily handling data from all of your customer's wearable devices.

For more information about the MapR Event Store, take a look at the courses available on MapR Academy.

MapR-XD

MapR-XD is a POSIX-compliant distributed file and object store. This allows you to directly ingest data from IoT devices in their native JSON, CSV, or Parquet formats, then directly query this data with Apache Drill, without spending any time or resources processing the data. You can also include large binary files like images, audio, or video for something like a security app that monitors a streaming CCTV feed. All of your data can be processed as it is streaming and gains the high availability, security, and replication advantages of a MapR cluster.

MapR Data Storage Solutions: Cloud Integration

MapR natively supports data storage and application containers in all major cloud providers.

The IoT data from your customer devices can be streamed to a cloud storage environment. From there, it can be accessed for processing from an on-premises cluster or even processed directly, using containers that are deployed in the same, or a different, cloud.

MapR Data Storage Solutions: Global Namespace

MapR provides a global namespace for all these different data sources used on the platform. This global namespace allows all data sources that are used by the application containers on your cluster to be seen as coming from a single source. Therefore, data does not have to be moved or copied, saving valuable time and resources. In addition, live streaming data and data saved in the cloud or on-premises can all be processed together, without the need for any preprocessing or consolidation.

Live Data Processing

In our fictional application, we stream the data to on-premises storage at the customer’s health care system. MapR can spin up machine learning applications to analyze the data as it streams in, all on the same cluster, saving the time of copying or transferring the data. In addition, MapR can scale the compute and storage independently. If more data is coming in, just add more application containers to support the increased demand.

MapR Data Platform

All of these tools in the MapR Data Platform share the same high availability, security, and replication technology that is consistent across MapR, and with the global namespace available with MapR, containerized apps finally have a persistent data source that will remain throughout the lifetime of the cluster.


This blog post was published November 14, 2018.
Categories

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.


Get our latest posts in your inbox

Subscribe Now