6 min read
By now, given that containers are commonly described as being transient in nature, being stateless and lightweight is well understood. Due to their inherent nature, containers have minimal to no overhead and thus can be spun up easily, which makes them very compelling for development and testing environments. The downside of this is that containers do not hold their own data. Any data that is present during the lifetime of a container is ephemeral and no longer exists after a container is deleted. In order to run applications in containers in production, data needs to be persisted. Here are a few options.
Docker offers different ways to store data. In order to understand the different options, it is important to have an idea of how the Docker filesystem works. Each Docker container is a read-only instance of a Docker image, which forms the base of the container. To handle writes, Docker then adds a read-write layer on top of the read-only layer. This strategy is referred to as copy-on-write. Docker maintains several such layers to form a union, and hence it’s called a union file system. These two concepts of layers and copy-on-write form the basis of the Docker filesystem. When a container is deleted, the layers are deleted, but the underlying image is maintained. When a new container is created, the same image is relaunched, but it starts with a fresh read-only instance of the image and thereby loses any modifications that were made before in its prior life.
A mount, a file, or directory used on the host machine can be mounted into a container. The application can then use the data that has been mounted. This is one of the earliest options offered by Docker. A variation is where the container can mount files or directories directly to the host machines’ memory. These two options are represented by two distinct commands: bind mount and tmpfs.
In this option, a data volume, which is essentially a directory, is designated as container data is created. This is an option to retain data, even if the containers are deleted. This data volume is initialized during container creation. Any new file creates or writes to the files can then be done directly within this data volume. Creating Docker volumes offers several advantages over just mounting directories. Volumes are easier to manage, as opposed to files and directories individually, and can be shared across containers.
Probably the fastest growing ecosystem would be vendors offering storage plugins for containers. With Kubernetes releasing a robust volume driver plugin, many vendors are quickly integrating and offering enterprise data services. The plugin architecture offers APIs that storage vendors can subscribe to and then expose their unique capabilities for containers to consume.
The storage plugins are probably the most efficient, especially if containerized applications are to be run in production, since storage vendors have had mature products that can easily be extended for containers.
MapR offers a unique set of storage services via its storage plugin. The MapR Data Platform is built on the foundation of a distributed, scalable file system with capabilities of high availability, data protection, and recovery.
MapR supports industry standard protocols to combine a powerful storage platform with data analytics. Additionally, MapR is an elastic platform as a whole: it can be deployed across disparate environments, including on-premises data centers, multi-cloud, and the edge. Organizations redesigning applications to a microservice architecture find MapR an ideal platform to work with.
Cloud vendors are taking major leaps towards offering containers as a service. Aside from having introduced a dizzying array of acronyms like ECS, EKS, AKS, and GKE, they all strive to have organizations deploy containers in the public cloud. Managed services offered by cloud vendors have the benefit of already offering storage services that, when combined, will address the persistent storage aspect for containers. Containers and cloud are definitely a better fit, due to both of their elasticity and cost effectiveness. However, this requires organizations to already have some form of services offered through the cloud vendors. Running applications entirely in the cloud may not be a viable option for many, for whom combining a well-planned container transformation with a storage plugin may be a better choice.
As highlighted above, there are several options available. Data and applications are becoming more and more portable and fluid, and it’s only a matter of time before data truly knows no boundaries.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.