One of the main benefits in using Docker and container technology is the portability of applications. It’s possible to spin up an application on-site or in a public cloud environment in a matter of minutes. What enables this is a storage technology that implements a layered “copy on write” approach to storing data in the file system used by Docker itself. Let's take a look at how it works.
We expect application code to change quite frequently. Developers like to roll out new features, patches and bug fixes with alarming frequency. Many of those changes, however, are pretty minor and might result in either a few bytes of source code change or in the executable used to run the application. It makes no sense to ship the entire application out when modifications are so small.
The folks at Docker spotted this issue early on, as it’s clear they wanted to make application deployment as flexible and portable as possible. As a result, the design of Docker images -- the term for a runnable Docker application -- is based on managing changes using a union mount file system (more on these in a moment). To imagine how this works, think of the layers of an onion. Working from the center outward, each layer represents a change to a directory or set of files. The base operating system or application sits in the centre of the onion; the layers represent the changes to the base layer over time, and the whole onion represents everything in the file system.
Each layer, which is identified by a unique GUID, introduces code changes, patches or perhaps configuration files. In many instances these changes can literally be a few bytes in length. By building an image from layers, an application update can be delivered to a host server by simply shipping the changes or differences from any layers already downloaded. The result is that developers can amend and ship code without having to distribute the entire application image. This is a much more effective and efficient process that saves on network bandwidth, time and effort.
Union mount file systems
Image layers in Docker are managed by the operating system using union mount file systems that work in conjunction with the standard file system like ext3 or ext4. Union mount file systems have existed in the Linux kernel for over 20 years. Initially this was as an implementation called inheriting file system, which was abandoned and replaced by UnionFS. Subsequent implementations such as aufs and OverlayFS also have been used. Docker provides the ability to use a range of union mount file systems using storage device drivers and the use of different drivers is dictated by Linux distribution. These include the already mentioned OverlayFS and aufs, native Device Mapper, Brtfs, VFS and ZFS.
The choice of driver depends on the format of the underlying file system. Unfortunately, the choice of driver isn’t straightforward, so unless your container environment has specific requirements around performance, it’s best to go with the default storage driver offered at the time of installing Docker.
The use of layers is implemented in the distribution of read-only images for Docker and for data stored in active containers. As a container is started, the Docker Runtime Engine creates a writeable layer sitting above the read-only image layers. This holds all of the configuration changes made in the container at start-up time, plus any changes made by the application/user. These changes can be viewed in the /var/lib/docker directory under a separate folder for each container.
Docker images are stored in a repository, either the Docker Hub (a public set of repositories) or advanced features like Docker Datacenter, where images are stored in the Docker Trusted Registry. It’s easy to see how having an efficient distribution mechanism allows dozens, hundreds or even thousands of server hosts to run Docker images without having to download the entire image content each time code changes. This is particularly valuable when containers move around the physical infrastructure; getting the latest version of an application image needs to be achieved as quickly as possible.
Content addressable images
With the release of Docker 1.10, the deployment model of images changed from a layered approach that used GUIDs to track each layer to the use of secure content hashes to track components of an image. This new design implements better security because hashes are cryptographically generated from the content, and reduces the risk of collisions from duplicate GUIDs.
The new model also introduces the ability to reuse components of an image between separate images that haven’t been generated from the same base. The old method meant that duplicate code between separate images would require a separate download. If two versions of CentOS, for example, were created independently, then the content would be duplicated in the images downloaded from the Docker (or end user) Hub. Now those downloads can share components and reduce network traffic.
Docker has worked hard to ensure that image creation and distribution is achieved as efficiently as possible. Probably the only area where work could be done is improving the visibility of images and the space occupied on disk. The docker images command shows the size of images downloaded to a host, but not the physical space occupied on disk. Naturally, some of this space would be duplicate, but having a view of the total space occupied by images would be helpful.
However, the Docker platform is fast evolving, so I’m sure we can expect further advances in image management before we know it.