Docker containers provide a real sea change in the way applications are written, distributed and deployed. The aim of containers is to be flexible and allow applications to be spun up on-demand, whenever and wherever they are needed. Of course wherever we use our applications, we need data.
There are two schools of thought on how data should be mapped into containers. The first says we keep the data only in the container; the second says we have persistent data outside of the container that extends past the lifetime of any individual container instance. In either scenario, the issue of security poses big problems for data and container management.
Managing data access
As discussed in my previous blog, there are a number of techniques for assigning storage to a Docker container. Temporary storage capacity, local to the host running the container can be assigned at container run time. Storage volumes assigned are stored within the host in a specific subdirectory mapped to the application. Volumes can be created at the time the container is instanced, or in advance using the “docker volume” command.
Alternatively, local storage can be mapped as a mount point into the container. In this instance, the “docker run” command specifies a local directory as the mounted point within the container. The third option is to use a storage plugin that directly associates external storage with the container.
In each of the described methods, the Docker framework provides no inherent security model for data. For example, any host directory can be mounted to a container, including sensitive system folders like /etc. It’s possible for a container to then modify those files, as permissions are granted using standard, simple Unix permission settings. An alternative and possibly better practice is to consider using non-root containers, which involves running containers under a different Linux user ID (UID). This is relatively easy to do, however does mean building a methodology to secure each container with either a group ID (GID) or UID as permissions checking is done on UID/GID numbers.
Here we run into another problem: Using non-root containers with local volumes doesn’t work, unless the UID used to run the container has permissions to the /var/lib/docker/volumes directory. Without this, data can’t be accessed or created. Opening up this directory would be a security risk; however, there’s no inherent method to set individual permissions on a per-volume basis without a lot of manual work.
If we look at how external storage has been mounted to a container, many solutions simply present a block device (a LUN) to the host running the container and format a file system onto it. This is then presented into the container as a mount point. At this point, the security on directories and files can be set by within container itself, reducing some of the issues we’ve discussed. However, if this LUN/volume is reused elsewhere, there are no security controls about how it is mounted or used on other containers, as there is no security model built directly into the container/volume mapping relationship. Everything depends on trusting the commands run on the host.
This is where we have yet another issue: a lack of multi-tenancy. When we run containers, each container instance may run for a separate application. As in traditional storage deployments, storage assigned to containers should have a degree of separation to ensure data can’t be inadvertently or maliciously accessed cross-application. There’s no easy way to currently do this at the host level, other than to trust the orchestration tool running the container and mapping it to data.
Finding a solution
Obviously some of the issues presented here are Linux/Unix specific. For example, the abstraction of the mount namespace provides different entry points for our data, however there’s no abstraction of permissions – I can’t map user 1,000 to user 1,001 without physically updating the ACL (access control list) data associated with each file and directory. Making large-scale ACL changes could potentially impact performance. For local volumes, Docker could easily set the permissions of the directory on the host that represents a new volume to match the UID of the container being started.
External volumes provide a good opportunity to move away from the permissions structure on the host running containers. However, this means that a mechanism is required to map data on a volume to a known trusted application running in a specific container instance. Remember that containers have no inherent “identification” and can be started and stopped at will. This makes it hard to determine whether any individual container is the owner of a data volume.
Today the main solution is to rely on the orchestration platform that manages the running of the containers themselves. We put the trust into these systems to map volumes and containers accurately. In many respects, this isn’t unlike traditional SAN storage or the way virtual disks are mapped to virtual machines. However, the difference for containers is the level of portability they represent and the need to have a security mechanism that extends to the public cloud.
There’s still some work to be done here. For Docker, its acquisition of storage startup Infinit may spur ideas about how persistent data is secured. This should hopefully mean the development of an interface that all vendors can work towards -- storage “batteries included” but optional.
Learn more about containers at Interop ITX, May 15-19 in Las Vegas. Container sessions include "Managing Containers in Production: What You Need To Think About," and "The Case For Containers: What, When, and Why?" Register now!