How to: Autoscaling Gitlab Continuous Integration runners GCP 🤓

Test runner servers can consume quite some resources, which is rather expensive. But even more problematic is a filled up queue of pipelines that block releases..

How to: Autoscaling Gitlab Continuous  Integration runners GCP 🤓

At Luminum Solutions, we run some relatively heavy test suites. The problem, however, is that runner servers can consume quite some resources, which is rather expensive. But even more problematic is a filled-up queue of pipelines that need running. It can block Merge Requests (Gitlab's equivalent to a Pull Request), in turn blocking the release of new features.

So ideally you'd have enough computing power running for all test suites, without paying too much. This is exactly what autoscaling infrastructure can bring!

The idea is to create a Compute Engine Instance Group that automatically scales as instances get above a certain CPU usage. The Managed Instance group allows you to specify a template for the instances running as part of the Instance Group.

Step 1. Creating the image

Assuming you have a Google Cloud project (if not, create one here), the first step is to create a custom image that can be used in the template. The image is based on Ubuntu 16.04 in my case.

Create the VM

To create the image, start a Compute Engine VM. We use VMs with 2 CPUs and 4GB of RAM but you're free to choose what you prefer during this step. You can make the disk quite small too, since the VMs will be ephemeral. I chose normal 20Gb hard disks.

This is what it should look like (sorry for the Dutch in there):

Creation of Compute Engine VM

Install the Gitlab Runner

After creating your VM, SSH into it and install the Gitlab Runner (instructions for Ubuntu are here). We don't use the Docker executor, even though most of the attention-heavy infrastructure we run is on Docker. This is because so-called Docker in Docker scenarios can cause problems that are outside of this post's scope.

Set up a cron job to clean images

Since the hard disk on our VM is not that big and Docker images can accumulate to take quite some space on the hard disk, it's ideal to automatically clean the unused Docker images every 24 hours. Just to be sure :)

You can save the following script to /etc/cron.daily/docker-auto-purge and make it executable with chmod +x /etc/cron.daily/docker-auto-purge.

#!/bin/sh

docker images -q |xargs docker rmi

If you want to specify the number of runners you want to allow per instance, you can edit the /etc/gitlab-runners/config.toml file and change the following lines to whatever you prefer:

concurrent = 1 # Concurrent jobs for this instance (0 does not mean unlimited!)
check_interval = 0 # How often to check GitLab for new builds (seconds)

Step 2. Create the image and template based on your VM

To actually set up your VM to be managed inside of an instance group, we'll have to create a blueprint for it. After this, we'll remove the VM and allow the Instance Group to create it based on the blueprint, and manage it.

Create the image

First, stop your VM. This will allow for safer passage through the land of image creation we're about to enter.

Select 'Compute Engine' from the sidebar in the Google Cloud Console and click 'Images'. Here, create a new image based on the hard disk (use it as the source disk) of your recently created VM.

With your image now created, you can easily create VMs based on it! It's exactly as easy as selecting 'Ubuntu 16.04' as your image, but instead, it will be your shiny image ✨

Create the Instance Template

Next, we'll add the image to an instance template! I used the following settings:

  • 2 CPUs
  • 4 Gb memory
  • 20 Gb persistent disk
  • Your custom image as the startup disk

If you have any custom networking set up, this is the time to add the configuration for it to your instance template. But no worries if you forget; it's super easy to update the instances in a managed instance group to a newer template 😄

Note: Make sure your VM won't run at > 95% CPU on your test suite though, as this will trigger the autoscaler to add new instances without good reason.

This step is important! Now, you should set up the startup script on your VM:

#!/bin/bash

# Register runners
sudo gitlab-ci-multi-runner register -n   --url https://my-gitlab.example.com   --registration-token s5-FUy15QVjqMNsgZWPM   --executor shell   --description "ext-shell-$(hostname)-1"

sudo gitlab-ci-multi-runner register -n   --url https://my-gitlab.example.com   --registration-token s5-FUy15QVjqMNsgZWPM   --executor shell   --description "ext-shell-$(hostname)-2"

apt install build-essential -y

# Install docker-compose
sudo curl -L https://github.com/docker/compose/releases/download/1.16.1/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Add the gitlab-runner user to the docker group
sudo usermod -aG docker gitlab-runner

This will let your runners automatically register on creation, and set up Docker compose. The reason I chose to add the docker-compose install here is to let me update/change the version whenever I please without having to update my image.

Now, save your instance template. Almost there!

Step 3. Create the instance group

Next, we'll create the actual instance group that runs the Gitlab runners.

Go to the Compute Engine page from the sidebar of Google Cloud Console, and click 'Instance Groups'. Now add a new Instance Group with your preferred configuration. Don't forget to double-check that you selected 'Managed Instance group'!

Use the Instance Template you created in the previous step here.

For the Statuscheck, I chose to set it to CPU usage of > 95%. This will trigger the auto scaler whenever an instance has more than 95% CPU utilization.

Then click 'Create'. Presto! One more step to go!

Step 4. Enable Gitlab to automatically remove old runners

Having a lot of runners running because of autoscaling is handy, but the runners don't unregister themselves. A solution would be to periodically check for runners that have been inactive for a while. I wrote a small Python script to help with this, which can be added as a CRON job on your GitLab instance:

import requests
import json
from dateutil import parser
from datetime import datetime, timedelta
import pytz

import logging

logger = logging.getLogger('gitlab_python_cron')
logger.setLevel(logging.DEBUG)
fh = logging.FileHandler('/var/log/gitlab_runners_autodelete.log')
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.addHandler(fh)

headers = {"PRIVATE-TOKEN": "<YOUR_PERSONAL_TOKEN_HERE>"}
base_url = "https://my-gitlab.example.com/api/v4/runners/%s?per_page=100"

runners = requests.get(base_url % 'all/', headers=headers)
runners = json.loads(runners.content)

logger.info("Got %s runners to delete" % str(len(runners)))

threshold_time = datetime.now(pytz.timezone('Europe/Amsterdam')) - timedelta(hours=8) # All runners that haven't reported for 8 hours will be deleted.

for runner in runners:
  runner = json.loads(requests.get(base_url % runner["id"], headers=headers).content)
  if parser.parse(runner["contacted_at"]) < threshold_time:
    resp = requests.delete(base_url % runner["id"], headers=headers)
    logger.info("Deleted runner with ID %s" % str(runner["id"]))

logger.info("Deleted runners! \n ---------")

You'll have to generate a Personal access token for the API and replace <YOUR_PERSONAL_TOKEN_HERE> in the snippet above. You can see how to create one here.

And that's it! Now your GitLab runner infrastructure can auto-scale as your test suites run, and you'll pay as you go whenever your runner infrastructure scales beyond the single node! And your GitLab instance even cleans itself up every day... ✨