As you may have read in our Intel Octane SSD storage blog post, we’ve been deep into some exciting independent research here at CloudPassage. Our research has inspired us to explore and ask the question: What is the real danger of not following best practices when protecting Docker registries? Well unsurprisingly, the danger is real, and has manifested itself as an interesting Docker Engine vulnerability, explained below.
We discovered that the Docker Engine is vulnerable to zip bombing, a very old technique characterized by causing an application to decompress a highly compressed file, which causes the exhaustion of system resources.
The vulnerable versions include:
1.12.6-0, 1.10.3, 17.03.0, 17.03.1, 17.03.2, 17.06.0, 17.06.1, 17.06.2, 17.09.0, and earlier
After investigating what can happen in a Docker environment when best practices are not followed for filesystem driver selection (which we explored in our previously mentioned Intel post), we wanted to dig a little deeper to see what other dangers lurk behind default settings. Keeping in the vein of resource exhaustion conditions, we looked at other image building characteristics and uncovered a bug in the Docker engine itself.
The basic unit of a running Docker application is a container, which runs on the Docker engine. Running containers are instantiated from container images, which are easy to bundle and ship around in the form of zipped archives, or through Docker registries, which are essentially special-purpose content-addressable blob stores. When a Docker application is required by a Docker engine, the engine typically attempts to pull it from the registry, then expands it to disk and runs the enclosed application.
Docker container images are composed of layers. Each layer represents a file system delta from the prior step in the build process. If you’re familiar with virtual machine (VM) infrastructure, you’ll kow this sounds a lot like how disk snapshots work for VMs. An important distinction to call out here is that the deltas represented in Docker image layers are file-level, not block-level like with VM snapshots. The Docker engine (with the help of overlay file systems) allows the sharing of identical layers between different images, thereby decreasing the on-disk footprint of a collection of images with common parentage.
Within an image layer that a Docker engine pulls from a Docker registry is a tarfile of the changes made to the file system during a specific step in the build process. The whole layer is delivered as a gzip file, and is then expanded into a directory on disk (which usually ends up being somewhere under /var/lib/docker, depending on your configured filesystem driver). Once all layers are expanded and written to disk, the Docker engine uses the underlying functionality of the operating system’s OverlayFS (assuming Linux kernel version 4+ and best-practices were followed for configuring the Docker engine) to compose all these layers into one file system mount. This filesystem mount becomes the root filesystem for the application inside the container. If you’ve been a BSD or Linux sysadmin, you’re likely seeing some similarity to chroot jails in all this.
It is worth pointing out that there are a number of protections implemented in the Linux kernel (cgroups, namespaces, etc.) to make containers more secure; to limit the resource consumption and access of running containers in the name of creating a more stable and secure operating environment within the Docker engine. The Docker Engine itself however doesn’t put a limit on at least one operation that isn’t strictly involved in running the container, and that’s where things get interesting.
So here’s where it all breaks down
We tried testing the impact of a bloated layer on the Docker engine by creating a large text file (20GB worth of zeroes) during a regular build process and loading it into a Docker registry and running it from a separate Docker engine. It worked, and after the layer containing the large file decompressed, the container was able to run. (The Docker registry is a content-addressable blob store, remember?) Our next question was: How much does the Docker engine trust what the Docker registry publishes? The answer ended up being, probably more than it should.
Within the Docker registry, each image is represented by a manifest file. This manifest file contains metadata like the commands that were run in the build process to create the image, the SHA256 fingerprint of each layer (that’s the content-addressable key) and other information pertaining to the image. In order to update an image, the client (Docker engine or other custom client) must pull the manifest, upload the changed layers (in the form of a gzip file for each layer) and update the manifest, referencing the new layers. Ideally, the gzip will contain a tarball, that’s what the Docker engine expects, anyway. One would further expect that some validation would happen within the engine, before decompressing each layer, to ensure that the content is not malformed. This was not the case.
To test the layer validation in the Docker engine, we compressed 20GB of zeroes into a single gzip archive. This wasn’t in the form of a file in a tar archive. Just 20GB of zeroes, gzipped down to around 20MB, and delivered as another layer for an existing image in the Docker registry. We had one Ubuntu container running inside the Docker engine when we attempted to pull the poisoned image. The Docker engine ran out of memory and died, taking the ‘canary’ container that was running when we performed the poisoned image pull. Bingo! As long as the uncompressed contents of the gzip file exceed the amount of available memory, the engine falls over and all running containers are killed.
If you’re following best practices, this is certainly an outside case. Someone would have to compromise credentials to an account having write access to an image in a Docker registry you use in your environment. You would have to have your private registry open to the world.
However, the default configuration of the Docker registry does not implement encryption or authentication, not to mention the fact that embedded authentication material continues to be a common mistake. A quick peek at https://shodan.io, searching for “Docker-Distribution-Api-Version: registry/2.0” returns over 750 public-facing registries. We didn’t go through the trouble of trying to see which ones didn’t have authentication properly configured, but surely we can all agree that it would be far better to not expose any registries unnecessarily, so that compromised credentials would be much harder to use.
So how can you protect yourself?
1. Don’t use images you don’t trust.
2. If you run your own private registry:
3. Keep this sort of scenario in mind when you implement your automation processes.
4. Analyze the images you use, early and often.
This discovery owes a huge debt of gratitude to Hana Lee (Github: @mong2) for proving the vulnerability and creating the original proof-of-concept exploit.
Docker (Moby) bug reference: https://github.com/moby/moby/issues/35075