Setting up caching with AWS S3#

When running a Semaphore agent in a self-hosted environment, cache storage in hosted jobs is not available. However, you can use an AWS S3 bucket to store our cache dependencies instead. In order to do so, the cache CLI needs access to perform a few actions on an S3 bucket in your AWS account.

Creating AWS resources#

The instructions that follow assume you already have the AWS CLI properly installed and configured on your personal machine. If you’re not sure how to configure it, you can follow this tutorial from AWS. You can also create the AWS resources described in this guide using the AWS console.

The bucket name we will use in this example is semaphore-cache and the AWS region will be us-east-1. Make sure you adjust the commands to match your bucket name and region of choice.

1. Create your AWS S3 bucket and block public access to it:

This is the bucket that the cache CLI will use to store and retrieve archives.

aws s3api create-bucket \
  --region us-east-1 \
  --bucket semaphore-cache

aws s3api put-public-access-block \
  --bucket semaphore-cache \
  --public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

If you are using the aws-agent-stack, this is the only thing you need to do. After creating the bucket, set the SEMAPHORE_AGENT_CACHE_BUCKET_NAME and you are all set.

2. Create an AWS IAM policy

It is always good practice to give services only the permissions that they need. The cache CLI only needs access to perform the following AWS S3 actions:

  • s3:PutObject
  • s3:GetObject
  • s3:ListBucket
  • s3:DeleteObject

One way to grant access for only these actions is through an AWS IAM policy. You can create one with the following commands:

cat > /tmp/semaphore-cache-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "statement1",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::semaphore-cache/*",
        "arn:aws:s3:::semaphore-cache"
      ]
    }
  ]
}
EOF

aws iam create-policy \
  --policy-name semaphore-cache-policy \
  --policy-document file:///tmp/semaphore-cache-policy.json \
  --description "Restricts access to some operations on the semaphore-cache S3 bucket"

The aws iam create-policy command will generate an output like the one shown below:

{
  "Policy": {
    "PolicyName": "semaphore-cache-policy",
    "CreateDate": "2015-06-01T19:31:18.620Z",
    "AttachmentCount": 0,
    "IsAttachable": true,
    "PolicyId": "ZXR6A36LTYANPAI7NJ5UV",
    "DefaultVersionId": "v1",
    "Path": "/",
    "Arn": "arn:aws:iam::0123456789012:policy/semaphore-cache-policy",
    "UpdateDate": "2015-06-01T19:31:18.620Z"
  }
}

Make sure you save the Arn of the AWS IAM policy. You'll use this value when attaching the AWS IAM policy to the AWS IAM user.

3. Create the AWS IAM user and its access keys:

This is the AWS IAM user that the cache CLI will use to authorize and access to your AWS bucket:

aws iam create-user --user-name semaphore
aws iam create-access-key --user-name semaphore

The aws iam create-access-key command will generate an output like the one shown below:

{
  "AccessKey": {
    "UserName": "semaphore",
    "Status": "Active",
    "CreateDate": "...",
    "SecretAccessKey": "...",
    "AccessKeyId": "..."
  }
}

Make sure you save the SecretAccessKey and the AccessKeyId values. We will need them when configuring the ~/.aws folder in the agent machine.

4. Attach the AWS IAM policy to the AWS IAM user:

Let's attach the AWS IAM policy we created in step #2 to the user we created in step #3:

aws iam attach-user-policy \
  --user-name semaphore \
  --policy-arn "[this is where the policy ARN you saved on step #2 needs to go]"

Configuring your machine#

Once you have all the AWS resources created, let's jump to the machine running the agent and configure the AWS credentials for the cache CLI to use.

1. Configuring the ~/.aws folder:

Make sure you use the SecretAccessKey and AccessKeyId for the proper AWS IAM user. Also, make sure you replace us-east-1 with your region of choice and semaphore-cache with your bucket name.

mkdir ~/.aws

cat > ~/.aws/config <<EOF
[default]
region = us-east-1
output = json
EOF

cat > ~/.aws/credentials <<EOF
[default]
aws_access_key_id = SecretAccessKey
aws_secret_access_key = AccessKeyId
EOF

Note that the ~/.aws folder should be created in the home directory of the user running the agent.

2. Updating the agent configuration:

Finally, let's instruct the agent to pass the required environment variables to the cache CLI, by adding them to your agent's config.yaml:

env-vars:
  - SEMAPHORE_CACHE_BACKEND=s3
  - SEMAPHORE_CACHE_S3_BUCKET=semaphore-cache

We're all set. After restarting your agent, you should be able to start using the cache CLI with your AWS S3 bucket in your pipelines.