Docker Layer Caching#
This document will show you how to use Layer Caching in Docker to make your builds faster and how to apply it in CI/CD workflows on Semaphore.
About Layer Caching in Docker#
Docker creates container images using layers. Each command that is found in a
Dockerfile
creates a new layer. Each layer contains the filesystem changes
to the image for the state before the execution of the command and the
state after the execution of the command.
Docker uses a layer cache to optimize and speed up the process of building Docker images.
Docker Layer Caching mainly works on the RUN
, COPY
and ADD
commands, which will be explained in more detail next.
The RUN Command#
The RUN
command allows you to execute a command in the Docker image. If the
layer that is generated by the RUN
command already exists in cache, the RUN
command will be executed only once.
As you will see later, a COPY
or an ADD
command can invalidate the
layer cache and make Docker to execute all RUN
commands.
The COPY Command#
The COPY
command in a Dockerfile
allows you to import one or more external
files into a Docker image. When executed, the COPY
commands ensures you
have the latest version of all relevant external files.
If the contents of all external files on the first COPY
command are the
same, the layer cache will be used and all subsequent commands until the
next ADD
or COPY
command will use the layer cache.
However, if the contents of one or more external files are different, then all subsequent commands will be executed without using the layer cache.
In order to take advantage of Layer Caching in Docker you should structure your
Dockerfile
in a way that frequently changing steps such as COPY
are
located near the end of the Dockerfile
file. This will ensure that the
steps concerned with performing the same action are not unnecessarily rebuilt.
The ADD Command#
The ADD
command in a Dockerfile
allows you to import external files into
a Docker image.
If the contents of all external files on the first ADD
command are the
same, the layer cache will be used and all subsequent commands until the
next ADD
or COPY
command will use the layer cache.
However, if the contents of one or more external files are different, then all subsequent commands will be executed without using the layer cache.
In order to take advantage of Layer Caching in Docker you should structure your
Dockerfile
in a way that frequently changing steps such as ADD
are
located near the end of the Dockerfile
file. This will ensure that the
steps concerned with performing the same action are not unnecessarily rebuilt.
About --cache-from#
The --cache-from
command line option in the docker
command allows you to build
a new image using a pre-existing one as the cache source. You will see this in more detail in the next section.
An example Semaphore 2.0 project#
The first thing that you will need is to create a Docker Registry data secret in Semaphore 2.0. Let's name it docker-hub
, and you can find out more information about it as follows:
$ sem get secrets docker-hub
apiVersion: v1beta
kind: Secret
metadata:
name: docker-hub
id: a2aaefdb-a4ff-4bc2-afd9-2afa9c7f3e51
create_time: "1538456457"
update_time: "1538456537"
data:
env_vars:
- name: DOCKER_USERNAME
value: docker-username
- name: DOCKER_PASSWORD
value: docker-password
files: []
The following illustrates the use of Docker Layer Caching in Semaphore 2.0 projects:
version: v1.0
name: Using Docker Layer Cache
agent:
machine:
type: e1-standard-2
os_image: ubuntu2004
blocks:
- name: Create Docker image
task:
jobs:
- name: Store Docker image in Registry
commands:
- checkout
- echo $DOCKER_PASSWORD | docker login --username "$DOCKER_USERNAME" --password-stdin
- cp D1 Dockerfile
- docker build -t go_hw:v1 .
- docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
- docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
- docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"
- docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"
secrets:
- name: docker-hub
- name: Use previous image
task:
jobs:
- name: Use restored Docker image as cache
commands:
- checkout
- docker images
- echo $DOCKER_PASSWORD | docker login --username "$DOCKER_USERNAME" --password-stdin
- cp D2 Dockerfile
- docker pull "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
- docker build --cache-from "$DOCKER_USERNAME:go_hw:$SEMAPHORE_GIT_BRANCH" -t go_hw:v2 .
- docker images
- docker run go_hw:v2
secrets:
- name: docker-hub
The .semaphore/semaphore.yml
file has two blocks
blocks. The first one
creates a Docker image that is reused in the second blocks
block using the
--cache-from
command line parameter.
The block named "Use previous image" simulates the case where a number of
unchanged layers will be reused from an image that was pulled from the Docker
Registry. In this case, all layers will be reused. The following docker
commands
use this functionality:
- docker pull "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
- docker build --cache-from "$DOCKER_USERNAME:go_hw:$SEMAPHORE_GIT_BRANCH" -t go_hw:v2 .
The docker pull
command gets an existing Docker image from the Docker
Registry, whereas the docker build
command uses the --cache-from
option in
order to try to reuse as many of the existing layers of the
$DOCKER_USERNAME:go_hw:$SEMAPHORE_GIT_BRANCH
image as possible.
- docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
- docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_BRANCH"
The aforementioned commands tag an existing Docker image and push it to the Docker registry in a way that can be found and reused as a cache Docker image.
- docker tag go_hw:v1 "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"
- docker push "$DOCKER_USERNAME"/go_hw:"$SEMAPHORE_GIT_SHA"-"$SEMAPHORE_WORKFLOW_ID"
The last two docker
commands should be executed when you want to deploy a
Docker image to production. The first one tags an existing Docker image in a
way that can be associated to the Git SHA value and the Workflow ID of the
Semaphore 2.0 project and the second one pushes that image to the Docker Registry.
The contents of the D1
file are as follows:
$ cat D1
FROM golang:alpine
RUN mkdir /files
COPY hw.go /files
WORKDIR /files
RUN go build -o /files/hw hw.go
The contents of the D2
file are as follows:
$ cat D2
FROM golang:alpine
RUN mkdir /files
COPY hw.go /files
WORKDIR /files
RUN go build -o /files/hw hw.go
ENTRYPOINT ["/files/hw"]
The D2
file includes all the contents of the D1
file and adds one more
line at the end of it, which makes it a perfect candidate for using the
Docker image created by using D1
as a cache.
To ensure a better cache hit, you should choose The SEMAPHORE_GIT_BRANCH
Semaphore 2.0 environment variable as a tag. This way, each branch will
have its own cache and you will avoid cache collision.