Dataware for data-driven transformation

Kubernetes

A Portable, Open-Source Platform for Managing Containerized Applications and Services

FREE EBOOK

Kubernetes for Machine Learning, Deep Learning,
and AI

Kubernetes logo

WHAT IS KUBERNETES?

Kubernetes is a portable, extensible, open-source platform for managing containerized applications and services that facilitates both declarative configuration and automation. Kubernetes provides a platform to configure, automate, and manage:

  • Intelligent and balanced scheduling of containers
  • Creation, deletion, and movement of containers
  • Easy scaling of containers
  • Monitoring and self-healing abilities

A Data Fabric for Kubernetes

See how the MapR volume plugin for Kubernetes can be used to provide a persistent data layer for ephemeral application containers running in Kubernetes.

WHY KUBERNETES?

K8s cons icon

Challenges with Previous Technologies

  • Before Kubernetes there were containers, which became popular because they simplified going from application development to deployment without having to worry about portability or reproducibility. Developers can package an application and all its dependencies, libraries, and configuration files needed to execute the application into a container image. A container is a runnable instance of an image. Container images can be pulled from a registry and deployed anywhere the container runtime is installed: on your laptop, servers on-premises, or in the cloud.
  • Container images
  • Compared to virtual machines, containers have similar resources and isolation benefits, but are lighter in weight because they virtualize the operating system instead of the hardware. Containers are more portable and efficient, take up less space, use far fewer system resources, and can be spun up in seconds.
  • Containers and virtual machines
  • Managing containers for production is challenging. As the container market continued to grow and many workloads transitioned to fully production-grade containers, it was clear cluster admins needed something beyond a container engine. Key capabilities were missing, such as:
    • Using multiple containers with shared resources
    • Monitoring running containers
    • Handling dead containers
    • Moving containers so utilization improves
    • Autoscaling container instances to handle load
    • Making the container services easily accessible
    • Connecting containers to a variety of external data sources
K8s pros icon

Advantages of Kubernetes

  • Containers paved the way to build cloud native systems, in which services are implemented using small clouds of containers. This created an enormous opportunity to add and adopt new services to make the use of containers easier, faster, and far more productive. Since it was open-sourced by Google in 2014, Kubernetes has become the de facto standard for container orchestration. Kubernetes leverages the power of containers while simplifying the management of services and machines in a cluster.
  • Kubernetes Clusters abstract their underlying computing resources, allowing users to deploy workloads to the entire cluster as opposed to a particular server. A Kubernetes cluster consists of at least one master node that manages the cluster and multiple worker nodes, where containerized applications run using Pods.
  • A Pod is a logical grouping of one or more containers, which are scheduled together and share resources. Pods enable multiple containers to run on a host machine and share resources such as storage, networking, and container runtime information.
  • Containers in pods
  • The Kubernetes architecture enables:
    • A single administrator to manage thousands of containers running simultaneously
    • Workload portability and orchestration of containers across on-site deployments to public or private clouds and to hybrid deployments in between

WHY KUBERNETES WITH MAPR?

K8s pros icon

The Stateful Container Challenge

  • Containers are transient. They have minimal to no overhead and thus can be spun up easily, which makes them very compelling for development and testing environments. The downside is that containers do not hold their own data. Any data present during the lifetime of a container is ephemeral and no longer exists after a container is deleted. In order to run containers in production, data needs to outlast the application processes and should be persisted outside the container to a data platform. Analogous to application orchestration, the orchestration of data used with containerized applications requires dataware that can persist state as input or output of the applications.
  • In Kubernetes, persistent storage can be created independently of any containers and attached on-demand when applications are deployed. MapR is providing this capability using a standard called the Container Storage Interface (CSI). CSI defines the interface between the container orchestration layer (such as Kubernetes) and the dataware (MapR) needed to handle data persistence for stateful containerized applications.
  • K8s, CSI, and MapR
  • MapR's storage plugin for Kubernetes provides the following capabilities to applications containers:
    • Containers can mount MapR volumes as a POSIX file system.
    • Containers can create and delete MapR volumes.
    • Containers can create MapR volume snapshots for point-in-time backups.
K8s pros icon

Data Persistence for Kubernetes

  • The MapR Data Platform is a unique example of a platform that persists files, tables, and streams and works in coordination with Kubernetes to orchestrate persistence of state from containerized applications.
  • k8s data persistence
  • The MapR Data Fabric for Kubernetes does the following:
    • Provides long-lived, persistent storage for pods and their containers
    • Allows containers running in Kubernetes to use the MapR file system for all of their storage needs
    • Allows secure storage of all container states in MapR XD Distributed File and Object Store
    • Addresses the limitations of container use by providing easy and full data access from within and across clouds and on-premises deployments
    • Allows stateful applications to be easily deployed in containers for production use cases, machine learning pipelines, and multi-tenant use cases

MapR CSI driver for Kubernetes

MapR provides persistent data storage for Kubernetes using a driver for the industry standard Container Storage Interface (CSI). This video demonstrates how to create a persistent volumes using static provisioning in Kubernetes with the MapR CSI driver, so that containers can mount MapR volumes as a POSIX filesystem.

KEY BENEFITS OF MAPR AND KUBERNETES

EXTREME SCALABILITY icon

EXTREME SCALABILITY

Use the dynamic “hot add” option to scale the clusters as the number of containers grow, improving performance.

Read More

HIGH PERFORMANCE icon

HIGH PERFORMANCE

Meet your performance SLAs needed for containerized enterprise applications by flexibly deploying on NVMe, SSDs, HDDs, or cloud.

Read More

MULTIPROTOCOL DATA PERSISTENCE icon

MULTIPROTOCOL DATA PERSISTENCE

Create, retain, and synchronize data volumes within containers using POSIX, NFS, and S3 protocols under a single global namespace.

Read More

HIGH AVAILABILITY icon

HIGH AVAILABILITY

Automatic failover ensures data is always available, so containerized applications can run on a 24x7 basis.

Read More

DATA PROTECTION icon

DATA PROTECTION

Protect critical data with mirroring, replication, and consistent point-in-time snapshots.

Read More

SECURITY AND MULTI-TENANCY icon

SECURITY AND MULTI-TENANCY

Use MapR tickets for end-to-end security of containers accessed by multiple users and groups.

Read More

WHY KUBERNETES MATTERS TO YOU

DEVELOPERS icon

DEVELOPERS AND OPERATIONS

  • Containerized workloads can be run on any platform or in any location without any changes to the application’s code.
  • Kubernetes and containers provide greater efficiency for developers. Instead of waiting for operations to provision machines, DevOps teams can quickly package an application into a container and deploy it consistently across different platforms, whether a laptop, a private data center, a public cloud, or hybrid environment.
  • dev-ops
DATA SCIENTISTS icon

DATA SCIENTISTS AND OPERATIONS

  • The use of containers to encapsulate data science jobs provides the valuable benefit of shielding those workloads from the complexity of the underlying technology stack. This ensures the correct and consistent dependencies are in place wherever jobs are run, whether on the developer laptop, training environment, or production cluster.
  • Combining kubernetes, microservices, containers, and event streams with DataOps makes managing and evaluating multiple models and deploying new models more efficient and agile.
  • dataops

WHY KUBERNETES WITH THE MAPR DATA PLATFORM MATTERS TO YOU

CIO / ENTERPRISE ARCHITECT icon

CIO / ENTERPRISE ARCHITECT

  • Meet line of business data needs at lower cost. Grant fast, secure, multi-tenant access to all data for the full spectrum of analytics needs.
  • Accelerate the business. Support in-place ML/AI and analytics, stateful containerized applications, and much more.
  • Deploy anywhere – in the public cloud, on-premises, at the edge, or all of the above at once.
IT / STORAGE ADMINISTRATOR icon

IT / STORAGE ADMINISTRATOR

  • Built for production. Consistent snapshots, replicas, and mirroring deliver enterprise-grade high availability and disaster recovery.
  • Multi-tenant by design. Assign policies (quotas, permissions, placement) to logical units of management called volumes.
  • Balance cost and performance. Leverage policy-based data tiering, erasure coding, data placement, and more.
DEVELOPERS icon

DEVELOPERS

  • Persist data for containerized applications. MapR Data Fabric for Kubernetes allows MapR volumes to be mounted for access by containers.
  • Scale data as containers grow. With a “grow as you go” feature, MapR handles growth in data without having to move data to a separate, dedicated environment
DATA SCIENTISTS icon

DATA SCIENTISTS

  • Faster time to insight. With support for POSIX, MapR XD works with newer Python-based ML and AI tools like Tensorflow and PyTorch. No need to move the data to a separate cluster.
  • Better support for machine learning logistics. Containerize AI and ML models and train them against all data – not just a subset – leading to more accurate results.

Why Is Kubernetes with MapR Better?

MapR Data Fabric for Kubernetes provides persistent storage for containers and enables the deployment of stateful containers. It addresses the limitations of container use by providing full data access from within and across clouds and on-premises deployments. Now stateful applications can easily be deployed in containers for production use cases, machine learning pipelines, and multi-tenant use cases.

Architecture diagram of k8s running on MapR Platform

The combination of distributed computing, streaming analytics, and machine learning is accelerating the development of next-generation intelligent applications, which take advantage of modern computational paradigms powered by modern computational infrastructure.

The MapR Data Platform combines a fully read/write distributed file system with the unusual features of a built-in NoSQL database and built-in stream transport. Data handled by MapR is directly accessible by AI and analytics tools, legacy programs, and via modern open source APIs. The MapR platform serves as dataware for building a comprehensive data system across on-premises, multi-cloud, or hybrid data centers.

The MapR XD Distributed File and Object Store is designed to store data at exabyte scale, support trillions of files, and combine analytics and operations into a single platform. MapR XD supports industry standard protocols and APIs, including POSIX, NFS, S3, and HDFS. Unlike Apache HDFS, which is a write once/append-only paradigm, the MapR Data Platform delivers a true read-write, POSIX-compliant file system. Support for the HDFS API enables Spark and Hadoop ecosystem tools for both batch and streaming to interact with MapR XD. Support for POSIX enables Spark and all non-Hadoop libraries to read and write to the distributed data store as if the data was mounted locally, which greatly expands the possible use cases for next-generation applications. Support for an S3-compatible API means MapR XD can also serve as the foundation for Spark applications that leverage object storage.

The MapR Event Store for Apache Kafka is the first big-data-scale streaming system built into a unified data platform and the only big data streaming system to support global event replication reliably at IoT scale. Support for the Kafka API enables Spark streaming applications to interact with data in real time in a unified data platform, which minimizes maintenance and data copying.

MapR Database is a high-performance NoSQL database built into the MapR Data Platform. MapR Database is multi-model: wide-column, key-value with the HBase API, or JSON (document) with the OJAI API. Spark connectors are integrated for both HBase and OJAI APIs, enabling real-time and batch pipelines with MapR Database.

The combination of Kubernetes and the MapR Data Platform form a powerful pair for taking advantage of application deployment via containers, either on-premises, in cloud and multi-cloud, or as a hybrid on-premises/cloud architecture. Kubernetes provides the orchestration layer for containerized applications, and MapR acts as the dataware needed for data orchestration. In this context, you can think of MapR as being like Kubernetes for data.

MapR for Kubernetes

This video illustrates how the MapR volume plugin for Kubernetes can be used to provide a persistent data layer for ephemeral application containers running in Kubernetes.

CUSTOMERS USING KUBERNETES WITH MAPR

SAP logo
SocGen

What's New?