Docker layer caching

Overview

This document will show you how you can use Layer Caching in Docker in order to make your builds faster and is indented for Semaphore 2.0 users that create Docker containers in their Semaphore 2.0 projects.

About Layer Caching in Docker

Docker creates container images using layers. Each command that is found in a Dockerfile creates a new layer. Each layers contains the filesystem changes of the image between the state before the execution of the command and the state after the execution of the command.

Docker uses a layer cache to optimize the process of building Docker images and make it faster.

Docker Layer Caching mainly works on RUN, COPY and ADD commands, which are going to be explained in more detail.

The RUN Command

The RUN command allows you to execute a command in the Docker image. If the layer that is generated by the RUN command already exists in cache, the RUN command will be executed only once.

As you will see in a while, a COPY or an ADD command can invalidate the layer cache and make Docker to execute all RUN commands.

The COPY Command

The COPY command in a Dockerfile allows you to import one or more external files into a Docker image. The COPY commands always get executed in order to have the latest version of the external file.

If the contents of all external files on the first COPY command are the same, the layer cache will be used and all subsequent commands until the next ADD or COPY command will use the layer cache.

However, if the contents of one or more external files are different, then all subsequent commands will be executed without using the layer cache.

In order to take advantage of Layer Caching in Docker you should structure your Dockerfile in a way that frequently changing steps such as COPY to be located towards the end of the Dockerfile file. This will ensure that the steps concerned with doing the same action are not unnecessarily rebuilt.

The ADD Command

The ADD command in a Dockerfile allows you to import external files into a Docker image.

If the contents of all external files on the first ADD command are the same, the layer cache will be used and all subsequent commands until the next ADD or COPY command will use the layer cache.

However, if the contents of one or more external files are different, then all subsequent commands will be executed without using the layer cache.

In order to take advantage of Layer Caching in Docker you should structure your Dockerfile in a way that frequently changing steps such as ADD to be located towards the end of the Dockerfile file. This will ensure that the steps concerned with doing the same action are not unnecessarily rebuilt.

About --cache-from

The --cache-from command line option in the docker command allows to build a new image using a pre-existing one as the cache source. You will see that in action in the next section.

An example Semaphore 2.0 project

The first thing that you will need is to create a secret in Semaphore 2.0. If your secret with the Docker Registry data is called docker-hub, you can find out more information about it as follows:

$ sem get secrets docker-hub
apiVersion: v1beta
kind: Secret
metadata:
  name: docker-hub
  id: a2aaefdb-a4ff-4bc2-afd9-2afa9c7f3e51
  create_time: "1538456457"
  update_time: "1538456537"
data:
  env_vars:
    - name: DOCKER_USERNAME
      value: docker-username
    - name: DOCKER_PASSWORD
      value: docker-password
  files: []

The following Semaphore 2.0 project illustrates the use of Docker Layer Caching in Semaphore 2.0 projects:

version: v1.0
name: Using Docker Layer Cache
agent:
  machine:
    type: e1-standard-2
    os_image: ubuntu1804

blocks:
  - name: Create Docker image
    task:
      jobs:
        - name: Store Docker image in Registry
          commands:
            - checkout
            - echo $DOCKER_PASSWORD | docker login --username "$DOCKER_USERNAME" --password-stdin
            - cp D1 Dockerfile
            - docker build -t go_hw:v1 .
            - docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
            - docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"

            - docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"
            - docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"

      secrets:
      - name: docker-hub

  - name: Use previous image
    task:
      jobs:
        - name: Use restored Docker image as cache
          commands:
            - checkout
            - docker images
            - echo $DOCKER_PASSWORD | docker login --username "$DOCKER_USERNAME" --password-stdin
            - cp D2 Dockerfile
            - docker pull "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
            - docker build --cache-from "$DOCKER_USERNAME:go_hw:$SEMAPHORE_GIT_BRANCH" -t go_hw:v2 .
            - docker images
            - docker run go_hw:v2

      secrets:
      - name: docker-hub

The .semaphore/semaphore.yml file has two blocks blocks. The first one creates a Docker image that is reused in the second blocks block using the --cache-from command line parameter.

The block named "Use previous image" simulates the case where a number of unchanged layers will be reused from an image that was pulled from the Docker Registry. In this case all layers will be reused. The docker commands that use this functionality are the following:

- docker pull "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
- docker build --cache-from "$DOCKER_USERNAME:go_hw:$SEMAPHORE_GIT_BRANCH" -t go_hw:v2 .

The docker pull command gets an existing Docker image from the Docker Registry whereas the docker build command uses the --cache-from option in order to try to reuse as many of the existing layers of the $DOCKER_USERNAME:go_hw:$SEMAPHORE_GIT_BRANCH image as possible.

- docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
- docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"

The aforementioned commands tag an existing Docker image and push it to the Docker registry in a way that can be found and reused as a cache Docker image.

- docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"
- docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"

The last two docker commands should be executed when you want to deploy a Docker image to production. The first one tags an existing Docker image in a way that can be associated to the Git SHA value and the Workflow ID of the Semaphore 2.0 project and the second one pushes that image to Docker Registry.

The contents of the D1 file are as follows:

$ cat D1
FROM golang:alpine

RUN mkdir /files
COPY hw.go /files
WORKDIR /files

RUN go build -o /files/hw hw.go

The contents of the D2 file are as follows:

$ cat D2
FROM golang:alpine

RUN mkdir /files
COPY hw.go /files
WORKDIR /files

RUN go build -o /files/hw hw.go
ENTRYPOINT ["/files/hw"]

The D2 file includes all the contents of the D1 file and adds one more line at the end of it, which makes it a perfect candidate for using the Docker image created using D1 as a cache.

To ensure a better cache hit, you should choose The SEMAPHORE_GIT_BRANCH Semaphore 2.0 environment variable as a tag. This way, each GitHub branch will have its own cache and therefore you will avoid cache collision.

See Also

Still need help? Contact Us Contact Us