Dataware for data-driven transformation

CSI, Kubernetes, and Dataware: Data Storage for Containerized Applications Just Got Easier

Contributed by

6 min read

Put your applications in a container, and life gets better.

That's a widespread idea that has some serious grounding in reality. Running applications in containers is now commonplace, driven by the benefits of portability (to any cloud or on-premises location), predictability and repeatability of runtime environment (reduces the chance of surprises when you deploy), and isolation (a big plus for avoiding interference in multi-tenant settings). And as more people use containers, there's increasing recognition of the need for data persistence related to those containerized applications. Containers are ephemeral – that's part of their attraction – but the lifecycle of data can and should outlast the application processes.

Two important aspects of working effectively with containerized applications are:

  • orchestration of the applications
  • orchestration of the data

For the application orchestration layer, Kubernetes has emerged overwhelmingly as the most popular choice. Other options include Docker Swarm and Mesosphere Marathon. Kubernetes is an open source system that originated at Google and now is maintained by the Cloud Native Computing Foundation. Since the release of Kubernetes v.1 in the summer of 2015, adoption has grown rapidly. The current version of Kubernetes is 1.13.

Analogous to application orchestration, the orchestration of data used with containerized applications requires dataware that can persist state as input or output of the applications. Having a data persistence layer working with container orchestration means you're not limited to running just stateless applications in containers, which makes containerization much more practical in real-world settings. The data orchestration should be easy to use and highly scalable. Dataware from MapR does just that. It acts like Kubernetes for data, providing highly scalable and reliable data persistence. And doing so just got easier: now the MapR Data Platform supports CSI, the open standard Container Storage Interface.

How Does CSI Help?

The Container Storage Interface grew out of an open community effort with early input from Docker, Mesosphere, Cloud Foundry, Kubernetes, Google, and Dell as a way to provide greater flexibility, smoother experience, and wider adoption for container usage. CSI is the standard interface connection between your choice of application orchestration layer and the data persistence layer, as shown in the following figure:

The Container Storage Interface (CSI) defines the interface between the container orchestration layer (such as Kubernetes) and the dataware (MapR) needed to handle data persistence for stateful containerized applications.

It's been over a year since MapR announced its data fabric for Kubernetes that allows MapR users to take advantage of Kubernetes to run containerized applications that need data persistence. Now MapR has taken this a step further by implementing the CSI to make life in the container world even easier and to provide even more flexibility.

Here's why it matters. The previous Kubernetes-specific in-tree plugin architecture allowed persistence to the MapR Data Platform, but it was coupled to Kubernetes core releases. The new CSI open plugin can be customized, and that's what MapR has done. MapR's implementation of CSI is part of the new release of the MapR Ecosystem Pack (MEP 6.1). MEP releases let MapR customers upgrade their open source ecosystem stack independently of the core MapR Data Platform. The newly released CSI interface for MapR makes data persistence from containerized applications easy and works smoothly in concert with orchestration of container location and resources by Kubernetes. With the freedom that CSI provides, MapR can update or extend the capabilities of the Container Storage Interface as needed, without having to wait for a new Kubernetes core release. POSIX Basic functionality is the default with the MapR CSI implementation, along with CentOS 7 as the operating system for the container image. With the new interface, Kubernetes can now set up and trigger MapR Snapshots directly. A great example of how that is useful shows up in DataOps, where you might want Kubernetes to trigger a snapshot on your training data as part of your data processing pipeline.

Furthermore, with this new implementation, MapR users are not bound to CentOS 7. They can choose any container image operating system and version that runs MapR. We provide a template that lets users build a customized version of the CSI container with their choice of operating system and version.

A CSI implementation makes sense because it fits with the overall MapR style: providing ease of management and the greatest range of choices for users. Dataware from MapR lets users work from edge to core, from on-premises to any cloud, or to multiple cloud deployments. Open APIs and direct access to data stored in the MapR Data Platform also support a huge range of machine learning and AI options, all of which helps users avoid vendor lock-in.

Finally, MapR users who are just starting to leverage containers find it particularly convenient as applications are converted to use containers one by one. It doesn't happen that you wave a wand and instantly manage all applications with Kubernetes. Indeed, most organizations continue to work with a mixture of containerized and non-containerized applications. MapR users like the fact that they can keep their data fully accessible on MapR while migrating to containerized workflows.

Kubernetes plus the MapR Data Platform, with the help of CSI, really does make life in development and in production better.

Additional Resources:

This blog post was published February 06, 2019.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now