Learn how container-based storage is implemented and blurring the lines between data, storage, and applications.
In my previous blog, I discussed how persistent storage is needed to ensure that data continues to exist after a container terminates. This persistent storage is expected to sit on a traditional storage array or perhaps a software-defined storage (SDS) solution running across commodity hardware. So in an SDS world, couldn’t containers simply be used to deliver storage itself?
The idea of using containers to deliver persistent storage seems counter-intuitive to the very nature of how containers are expected to work. A container is typically seen as ethereal or short lived, with no persistent resources connected to it. Conversely, storage is expected to be resilient, persistent and the single point of truth for all of our data. However, vendors are starting to bring products to the market that use containers as the implementation of the storage platform or layer.
Container concepts and storage design
The idea of using containers for persistent storage delivery brings together two concepts that optimize the speed of applications: Make application code as lightweight as possible, and put the data as close to the application as possible. The idea of lightweight code is pretty simple to grasp; putting data closer to the application requires a bit more thought.
Historically or at least over the last 15 years, data has been stored on external storage arrays to gain the benefits of scale, performance, efficiency and resiliency. The trade-off in this design has been the time taken to access the data over the storage area network. With disk-based systems, the SAN overhead wasn’t really that noticeable. As we move into the flash era, the time taken to traverse the network, plus the time spend executing the “storage stack” code is becoming increasingly obvious in application response times.
The answer has been to cache data locally with the application, either in the hypervisor (for virtual machines) or within the host. So imagine in a container environment, if storage was implemented within a container running on the same host as an application, the time taken to access that data would potentially be minimal. Remember that any container is just a collection of Linux processes, so container-to-container communications are fast.
The rationale makes sense, but how can container-based storage be implemented? The first consideration is to think of containerization as the opportunity to virtualize and abstract pieces of the storage stack. For example, we can create a container that simply manages an individual piece of storage media like a disk drive or SSD. Interaction with that container allows us to store and retrieve data on the device. The container can manage how data is distributed, handle metadata, and communicate with other containers to handle data protection and replication. If the container fails, we simply restart it; as long as there is enough metadata on the physical device to help the container restart, then no other persistence is required.
This kind if design is nice because it encapsulates processes within individual microservices. If the “disk handling” container needs to be amended, it can be changed without having to recompile the entire storage platform. In addition, this kind of abstraction means storage could be developed for any platform capable of running containers – currently Linux and Windows – and have them interoperate very easily.
Now, who’s using this kind of deployment model today? StorageOS, a UK-based startup has developed a lightweight container application that runs across multiple hosts/nodes and has a mere 40 MB footprint. Portworx has developed a similar solution that runs services across multiple containers to deliver a scale-out architecture based on commodity hardware. Scality has started to introduce microservices functions into its RING software, based on containers.
It’s easy to assume container-based storage is only used by storage startups, however that’s not the case. EMC’s Unity storage array (the evolution of Clariion and VNX) uses containers to run data movers, providing a much more scalable solution for serving front-end I/O. Although this doesn’t strictly adhere to the design principles discussed above, it shows that the use of containers for delivering storage is starting to become widespread.
Moving apps to storage
So where do we go next? Well, some storage vendors have already started providing the capability of moving the application to the storage. Coho Data began offering the ability to run application code in a container on the storage platform in 2015. Zadara Storage provides the same capability in its platform. Both of these implementations were initially seen as a way to run data-intensive work such as virus scanning or compliance checking, but could equally be used to run persistent database applications.
What’s clear is the line between data, storage and the application is being blurred to the point that both can co-exist together on the same infrastructure. This was one of the key benefits of hyperconvergence, which started the trend to eliminating dedicated storage hardware. Delivering storage with containers takes us one step further by eliminating the boundary of running apps and storage in separate VMs We are inexorably moving closer to the goal of a truly software-defined data center.