Autoscale GitLab CI on AWS ECS Fargate

Autoscale your GitLab CI

9 min readJan 5, 2023

and me were trying to run Gitlab Runner to execute our CI Jobs, and we explored multiple options to run Gitlab Runner on AWS Cloud and found that running Gitlab Runner on ECS Fargate is optimal solution

Amazon ECS is a fully managed container orchestration service that makes it easy for you to deploy, manage, and scale containerized applications. Run and scale your container workloads across availability zones, in the cloud, and on-premises, without the complexity of managing a control plane or nodes.

In this blog post we are going to discuss how you can Autoscale your GitLab CI by running your CI jobs on AWS ECS Fargate. I will take an example of how you can use terraform to create AWS Infra when a commit is made to a branch. GitLab Runner will execute our job(s) from .gitlab-ci.yml file by running container on AWS ECS Fargate Cluster and once the work is done for particular job the container will be terminated.

AWS Fargate pricing is calculated based on the vCPU, memory, Operating Systems, CPU Architecture, and storage resources used from the time you start to download your container image until the Amazon ECS Task Pod terminates, rounded up to the nearest second.

Architecture Diagram:

https://docs.gitlab.com/runner/configuration/img/runner_fargate_driver_ssh.png

In the Architecture Diagram you can see that Gitlab Runner along with the Fagate Driver (custom executor) is running on EC2 Instance (you can run it on Container as well). The SSH keys are automatically managed by the Fargate driver. The container must be able to accept keys from the SSH_PUBLIC_KEY environment variable.

The GitLab custom executor driver for AWS Fargate automatically launches a container on the Amazon Elastic Container Service (ECS) to execute each GitLab CI job.

Step 1: Create IAM role for EC2 Instance

Go IAM Console and click on Create Role
Trusted entity type -> EC2 and Use case -> EC2 and click on Next
Permissions -> AmazonECS_FullAccess and click on Next
Give a meaningful name and click on Create Role

Step 2: Create Security Group

Go to EC2 Console, click on Security groups and click on create new security Group
Give name and description
Select the VPC (This should be same as your VPC in which you will be creating EC2 Instance)
Click Add Inbound Rule and add SSH rule (You can allow access from anywhere but as a best practice you should restrict the access to particular IP or IP ranges or VPC CIDR)
Click on Create Security Group

Step 3: Create EC2 Instance for GitLab Runner and Fargate Driver

Go to EC2 Console and click on Launch Instance
Give Name and add additional tags if required
Select Amazon Linux 2 AMI
Select t2.micro Instance Type
Select Your Key Pair or create new Key Pair
Edit Networking section and Select VPC, Subnet and Existing Security Group (created in step 2), Assign IAM Role (created in step 1)
Click Launch Instance

Step 4: Build Docker Image and push to ECR

Docker Image should have GitLab Runner, which handles artifacts and caching along with our required dependency and packages.

Our GitLab Repo Structure

terraform-test
|_ custum-fargate-executor
   |_ Dockerfile
   |_ docker-entrypoint.sh
|_ modules
   |_ EC2
   |_ RDS
   |_ S3
|_ .gitlab-ci.yml
|_ .gitignore
|_ provider.tf
|_ backend.tf
|_ s3.tf

Open file terraform-test/custum-fargate-executor/Dockerfile and add following content

FROM debian:buster

# ---------------------------------------------------------------------
# Install https://github.com/krallin/tini - a very small 'init' process
# that helps processing signalls sent to the container properly.
# ---------------------------------------------------------------------
ARG TINI_VERSION=v0.19.0

RUN apt-get update && \
    apt-get install -y curl && \
    curl -Lo /usr/local/bin/tini https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-amd64 && \
    chmod +x /usr/local/bin/tini

# --------------------------------------------------------------------------
# Install and configure sshd.
# https://docs.docker.com/engine/examples/running_ssh_service for reference.
# --------------------------------------------------------------------------
RUN apt-get install -y openssh-server && \
    # Creating /run/sshd instead of /var/run/sshd, because in the Debian
    # image /var/run is a symlink to /run. Creating /var/run/sshd directory
    # as proposed in the Docker documentation linked above just doesn't
    # work.
    mkdir -p /run/sshd

EXPOSE 22

# ----------------------------------------
# Install GitLab CI required dependencies.
# ----------------------------------------
ARG GITLAB_RUNNER_VERSION=v12.9.0

RUN curl -Lo /usr/local/bin/gitlab-runner https://gitlab-runner-downloads.s3.amazonaws.com/${GITLAB_RUNNER_VERSION}/binaries/gitlab-runner-linux-amd64 && \
    chmod +x /usr/local/bin/gitlab-runner && \
    # Test if the downloaded file was indeed a binary and not, for example,
    # an HTML page representing S3's internal server error message or something
    # like that.
    gitlab-runner --version

RUN apt-get install -y bash ca-certificates git git-lfs && \
    git lfs install --skip-repo

# ----------------------------------------
# Install Terraform required dependencies.
# ----------------------------------------

RUN apt update -y && \
    apt install  software-properties-common gnupg2 curl -y && \
    curl https://apt.releases.hashicorp.com/gpg | gpg --dearmor > hashicorp.gpg && \
    install -o root -g root -m 644 hashicorp.gpg /etc/apt/trusted.gpg.d/ && \
    apt-add-repository "deb [arch=$(dpkg --print-architecture)] https://apt.releases.hashicorp.com focal main" && \
    apt update -y && \
    apt install terraform=1.3.6 -y && \
    apt-get install awscli -y

RUN apt update -y && \
    apt-get install binutils -y && \ 
    strings -help

RUN terraform --version   

# -------------------------------------------------------------------------------------
# Execute a startup script.
# https://success.docker.com/article/use-a-script-to-initialize-stateful-container-data
# for reference.
# -------------------------------------------------------------------------------------
COPY docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh

RUN chmod +x /usr/local/bin/docker-entrypoint.sh

ENTRYPOINT ["tini", "--", "/usr/local/bin/docker-entrypoint.sh"]

Note:- We are using (https://gitlab.com/tmaczukin-test-projects/fargate-driver-debian/-/tree/master) as base dockerfile and added dependency to install terraform (you can modify dockerfile based on your requirements)

Open file terraform-test/custum-fargate-executor/docker-entrypoint.sh and add following content

#!/bin/sh

# Create a folder to store user's SSH keys if it does not exist.
USER_SSH_KEYS_FOLDER=~/.ssh
[ ! -d ${USER_SSH_KEYS_FOLDER} ] && mkdir -p ${USER_SSH_KEYS_FOLDER}

# Copy contents from the `SSH_PUBLIC_KEY` environment variable
# to the `$USER_SSH_KEYS_FOLDER/authorized_keys` file.
# The environment variable must be set when the container starts.
echo ${SSH_PUBLIC_KEY} > ${USER_SSH_KEYS_FOLDER}/authorized_keys

# Clear the `SSH_PUBLIC_KEY` environment variable.
unset SSH_PUBLIC_KEY

# Start the SSH daemon
exec /usr/sbin/sshd -D

Note:- Do not do any modification in docker-entrypoint.sh file it might break the things.

Build docker image using following command:

docker build -t <name_of_docker_image> <path_to_Dockerfile>

Push Image to ECR:

Create Repository on ECR
Click on Repository name and click on view push commands (follow the steps and push your docker image to ECR)
Click on Image tag and note down the URI (This URI will be used in Fargate Task Creation)

Step 5: Create Fargate Task

Go ECS Console and then click Task Definitions
Click Create new Task Definition
Choose FARGATE and click Next step
Give Name (Note: The name will be used in fargate.toml file)
Assign Task Role (This will be used by containers to call AWS API calls) — Mandatory
a. This role should have AmazonEC2ContainerServiceRole Policy
Operating system family -> Linux
Assign Execution Role (This will be used by ECS container agent to make AWS API calls)
Select values for Task memory (GB) and Task CPU (vCPU)
Click Add container. Then:

Name it ci-coordinator(do not change this name), so the Fargate driver can inject the SSH_PUBLIC_KEY environment variable.
Define image (Created in Step 4, write URI of ECR)
Define port mapping for 22/TCP
Click Add

8. Click Create

Step 6: Sample Terraform Files

Terraform will require permission to create AWS resources on our behalf. There are different ways in which you can provide credentials to terraform, I will be using assume_role to supply AWS Credentials to terraform.

Create IAM role for terraform to assume and provision AWS Resources

Go to IAM console and click on Roles
Click on Create Role
Trusted entity type -> AWS Account and, An AWS account -> This Account (your_account_number will be displayed) and click Next
Add permissions -> AdministratorAccess Policy (We need to give Admin access to terraform, you can select your own custom policy if required)
Give name to role (Example: terraform-assume-role)
Click on Create Role

Once the role is created any role/user in your AWS account can assume this role and has admin permissions, if you want to limit the access then update the role trust relationships as per your need. (For Example you can allow only ECS Task Role to assume this role by updating Principal to: arn:aws:iam::<your_aws_account_id>:role/ecsTaskExecutionRole)

2. Open file terraform-test/backend.tf and add following content

terraform {
  required_version = ">= 0.13"

  backend "s3" {
    region         = "<s3_bucket_region>"
    encrypt        = true
    bucket         = "<s3_bucket_name>"
    key            = "<terraform_state_file_name>"
    role_arn       = "<role_arn_created_in_step_6.1>"
  }
}

3. Open file terraform-test/provider.tf and add following content

terraform {
  required_version = ">= 1.0.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
}

provider "aws" {
    region       = "<your_aws_region>"
    assume_role {
        role_arn     = "<role_arn_created_in_step_6.1>"
        session_name = "Terraform_session"
    }
}

4. Open file terraform-test/s3.tf and add following content

data "aws_caller_identity" "current" {}

resource "aws_s3_bucket" "test_bucket" {
  bucket = "<unique_bucket_name>"
  tags = {
    Environment = "Dev"
  }
}

Step 7: Create ECS Cluster

Go to ECS Console
Click Create Cluster
Choose Networking only type. Click Next step
Give Name (Note: The name will be used infargate.toml)
Click Create
Click View cluster. Click Update Cluster button
Next to Default capacity provider strategy, click Add another provider and choose FARGATE. Click Update

Step 8: Configure EC2 Instance

SSH into EC2 Instance and become root user

ssh -i <path_to_key_pair> ec2-user@<IP_address_of_EC2>

sudo su -

2. Run following command to install gitlab-runner

sudo mkdir -p /opt/gitlab-runner/{metadata,builds,cache}

sudo curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.rpm.sh" | sudo bash

sudo yum install gitlab-runner -y

3. Go to your GitLab project’s Settings > CI/CD and expand the Runners section. Under Set up a specific Runner manually, note the registration token

4. Register runner using following command

sudo gitlab-runner register --url <your_gitlab_url> --registration-token <your_registration_token> --name fargate-test-runner --run-untagged --executor custom -n

This command will create config.toml file and shown below is created by the registration command. Do not change it.

concurrent = 1
check_interval = 0

[session_server]
  session_timeout = 1800

name = "fargate-test-runner"
url = "<your_gitlab_url>"
token = "<your_registration_token>"
executor = "custom"

5. Run sudo vim /etc/gitlab-runner/config.toml and add the following content but do not modify pre written key/values by gitlab-runner:

log_level = "debug"
concurrent = 1
check_interval = 0

[session_server]
  listen_address = "[::]:8093"
  session_timeout = 1800

[[runners]]
  name = "fargate-test-runner"
  url = "<your_gitlab_url>"
  token = "<your_registration_token>"
  token_obtained_at = ""
  token_expires_at = ""
  executor = "custom"
  builds_dir = "/opt/gitlab-runner/builds"
  cache_dir = "/opt/gitlab-runner/cache"
  clone_url = "<if_you_have_custom_gitlab_url_or_else_remove_clone_url_property>"
  [runners.custom_build_dir]
    enabled = true
  [runners.custom]
    config_exec = "/opt/gitlab-runner/fargate"
    config_args = ["--config", "/etc/gitlab-runner/fargate.toml", "custom", "config"]
    prepare_exec = "/opt/gitlab-runner/fargate"
    prepare_args = ["--config", "/etc/gitlab-runner/fargate.toml", "custom", "prepare"]
    run_exec = "/opt/gitlab-runner/fargate"
    run_args = ["--config", "/etc/gitlab-runner/fargate.toml", "custom", "run"]
    cleanup_exec = "/opt/gitlab-runner/fargate"
    cleanup_args = ["--config", "/etc/gitlab-runner/fargate.toml", "custom", "cleanup"]

Note:- In our case we have used clone_url (Overwrite the URL for the GitLab instance. Used only if the runner can’t connect to the GitLab URL.) property because our gitlab has different url for cloning the repo.

6. Run sudo vim /etc/gitlab-runner/fargate.toml and add the following content:

LogLevel = "info"
LogFormat = "text"

[Fargate]
  Cluster = "<your_ECS_cluster_name>"
  Region = "<ECS_cluster_region>"
  Subnet = "<subnetId_in_which_you_want_to_place_fargate_task>"
  SecurityGroup = "<securityGroupId_which_you_want_to_attach_to_fargate_task>"
  TaskDefinition = "<task_definition_name>:<revision>"
  EnablePublicIP = false

[TaskMetadata]
  Directory = "/opt/gitlab-runner/metadata"

[SSH]
  Username = "root"
  Port = 22

Notes:

Give exact name of Cluster, TaskDefinition and revision number (If a revision number is not specified, the latest active revision is used)
You can use same subnet ID which you have used while creating EC2 Instance in Step 3
You can use same security group Id which you have used for EC2 (If you have enabled public SSH access or You can have SG rule to allow access on port 22 from VPC CIDR) or Create new security group and make sure that EC2 Security group is allowed on port 22 for new security group. (Fargate Driver running on EC2 will requires SSH access to your container i.e fargate task, so make sure to add correct SG Id).

7. Install the Fargate driver:

sudo curl -Lo /opt/gitlab-runner/fargate "https://gitlab-runner-custom-fargate-downloads.s3.amazonaws.com/latest/fargate-linux-amd64"
sudo chmod +x /opt/gitlab-runner/fargate

8. Check status of gitlab-runner service

sudo systemctl status gitlab-runner.service

Step 9: Test

Your configuration should now be ready to use.

In your GitLab project, create a simple .gitlab-ci.yml file and add following content:

before_script:
  - ps -ef
  - xargs --null --max-args=1 echo < /proc/1/environ
  - cat /root/.profile
  - export $(strings /proc/1/environ | grep AWS_CONTAINER_CREDENTIALS_RELATIVE_URI)
  - echo $AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
  - aws sts get-caller-identity
  - terraform --version
  - terraform init

stages:
  - validate
  - plan
  - apply

validate:
  stage: validate
  script:
    - terraform validate

plan:
  stage: plan
  script:
    - terraform plan -out "planfile"
  dependencies:
    - validate
  artifacts:
    paths:
      - planfile

apply:
  stage: apply
  script:
    - terraform apply -input=false "planfile"
  dependencies:
    - plan

Note:- export $(strings /proc/1/environ | grep AWS_CONTAINER_CREDENTIALS_RELATIVE_URI) Command written in before_script section is very important. Do not remove it.

When a container starts in ECS Fargate only process running with PID 1 has permissions to make AWS API calls, in our case the docker-entrypoint.sh will be executed with PID 1(you can see in output of ps -ef command) and as per the architecture diagram you can see that Fargate Custom Executor driver will SSH into container first (so its PID will be different, because it will bestarting new shell) and then it will send commands to container hence, we need to export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI variable from PID 1 so that every process running in container can make AWS API calls from container.

As mentioned above GitLab Runner will create container for every job(stage) we have added few lines in before_script section of gitlab-ci.yml file. So when a job will be executed it will execute commands written in before_script first and then it will execute commands written in each job(stages)

Congratulations!!! 🥳

You have successfully configured your CI pipeline to Autoscale and it will run CI Jobs on AWS ECS Fargate Cluster.