Docker Layer Caching#

This document will show you how to use Layer Caching in Docker to make your builds faster and how to apply it in CI/CD workflows on Semaphore.

About Layer Caching in Docker#

Docker creates container images using layers. Each command that is found in a Dockerfile creates a new layer. Each layer contains the filesystem changes to the image for the state before the execution of the command and the state after the execution of the command.

Docker uses a layer cache to optimize and speed up the process of building Docker images.

Docker Layer Caching mainly works on the RUN, COPY and ADD commands, which will be explained in more detail next.

The RUN Command#

The RUN command allows you to execute a command in the Docker image. If the layer that is generated by the RUN command already exists in cache, the RUN command will be executed only once.

As you will see later, a COPY or an ADD command can invalidate the layer cache and make Docker to execute all RUN commands.

The COPY Command#

The COPY command in a Dockerfile allows you to import one or more external files into a Docker image. When executed, the COPY commands ensures you have the latest version of all relevant external files.

If the contents of all external files on the first COPY command are the same, the layer cache will be used and all subsequent commands until the next ADD or COPY command will use the layer cache.

However, if the contents of one or more external files are different, then all subsequent commands will be executed without using the layer cache.

In order to take advantage of Layer Caching in Docker you should structure your Dockerfile in a way that frequently changing steps such as COPY are located near the end of the Dockerfile file. This will ensure that the steps concerned with performing the same action are not unnecessarily rebuilt.

The ADD Command#

The ADD command in a Dockerfile allows you to import external files into a Docker image.

If the contents of all external files on the first ADD command are the same, the layer cache will be used and all subsequent commands until the next ADD or COPY command will use the layer cache.

However, if the contents of one or more external files are different, then all subsequent commands will be executed without using the layer cache.

In order to take advantage of Layer Caching in Docker you should structure your Dockerfile in a way that frequently changing steps such as ADD are located near the end of the Dockerfile file. This will ensure that the steps concerned with performing the same action are not unnecessarily rebuilt.

About --cache-from#

The --cache-from command line option in the docker command allows you to build a new image using a pre-existing one as the cache source. You will see this in more detail in the next section.

An example Semaphore 2.0 project#

The first thing that you will need is to create a Docker Registry data secret in Semaphore 2.0. Let's name it docker-hub, and you can find out more information about it as follows:

$ sem get secrets docker-hub
apiVersion: v1beta
kind: Secret
metadata:
  name: docker-hub
  id: a2aaefdb-a4ff-4bc2-afd9-2afa9c7f3e51
  create_time: "1538456457"
  update_time: "1538456537"
data:
  env_vars:
    - name: DOCKER_USERNAME
      value: docker-username
    - name: DOCKER_PASSWORD
      value: docker-password
  files: []

The following illustrates the use of Docker Layer Caching in Semaphore 2.0 projects:

version: v1.0
name: Using Docker Layer Cache
agent:
  machine:
    type: e1-standard-2
    os_image: ubuntu1804

blocks:
  - name: Create Docker image
    task:
      jobs:
        - name: Store Docker image in Registry
          commands:
            - checkout
            - echo $DOCKER_PASSWORD | docker login --username "$DOCKER_USERNAME" --password-stdin
            - cp D1 Dockerfile
            - docker build -t go_hw:v1 .
            - docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
            - docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"

            - docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"
            - docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"

      secrets:
      - name: docker-hub

  - name: Use previous image
    task:
      jobs:
        - name: Use restored Docker image as cache
          commands:
            - checkout
            - docker images
            - echo $DOCKER_PASSWORD | docker login --username "$DOCKER_USERNAME" --password-stdin
            - cp D2 Dockerfile
            - docker pull "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
            - docker build --cache-from "$DOCKER_USERNAME:go_hw:$SEMAPHORE_GIT_BRANCH" -t go_hw:v2 .
            - docker images
            - docker run go_hw:v2

      secrets:
      - name: docker-hub

The .semaphore/semaphore.yml file has two blocks blocks. The first one creates a Docker image that is reused in the second blocks block using the --cache-from command line parameter.

The block named "Use previous image" simulates the case where a number of unchanged layers will be reused from an image that was pulled from the Docker Registry. In this case, all layers will be reused. The following docker commands use this functionality:

- docker pull "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
- docker build --cache-from "$DOCKER_USERNAME:go_hw:$SEMAPHORE_GIT_BRANCH" -t go_hw:v2 .

The docker pull command gets an existing Docker image from the Docker Registry, whereas the docker build command uses the --cache-from option in order to try to reuse as many of the existing layers of the $DOCKER_USERNAME:go_hw:$SEMAPHORE_GIT_BRANCH image as possible.

- docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
- docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"

The aforementioned commands tag an existing Docker image and push it to the Docker registry in a way that can be found and reused as a cache Docker image.

- docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"
- docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"

The last two docker commands should be executed when you want to deploy a Docker image to production. The first one tags an existing Docker image in a way that can be associated to the Git SHA value and the Workflow ID of the Semaphore 2.0 project and the second one pushes that image to the Docker Registry.

The contents of the D1 file are as follows:

$ cat D1
FROM golang:alpine

RUN mkdir /files
COPY hw.go /files
WORKDIR /files

RUN go build -o /files/hw hw.go

The contents of the D2 file are as follows:

$ cat D2
FROM golang:alpine

RUN mkdir /files
COPY hw.go /files
WORKDIR /files

RUN go build -o /files/hw hw.go
ENTRYPOINT ["/files/hw"]

The D2 file includes all the contents of the D1 file and adds one more line at the end of it, which makes it a perfect candidate for using the Docker image created by using D1 as a cache.

To ensure a better cache hit, you should choose The SEMAPHORE_GIT_BRANCH Semaphore 2.0 environment variable as a tag. This way, each branch will have its own cache and you will avoid cache collision.

See Also#