Benoit Galati
by Benoit Galati
3 min read

Tags

  • CI
  • caching
  • docker

Let’s say you have some great docker images for your dev/prod environment but they are quite long to build, specially in a CI that you want as fast as possible.

If you have a docker registry available, you could avoid the pain of re-building them completely every single time.

Docker caching mechanism

The idea is to use the option --cache-from of docker build but with docker-compose because I like managing as much things as possible with docker-compose.

The problem that needs to be solved first is : how can I have a proper docker tag to avoid having to re-build the whole image when the CI ran ?

The solution is simple: generate a md5 hash of your docker image and dependencies (E.g. config files, etc.).

Let’s dive into the example to see how we can do this.

The fictive example

Everything is simplified here to showcase the big picture in CircleCI. The important parts are directly explained in the examples.

We start with docker-compose.yml file :

version: "3.2"
    app:
      services:
        volumes:
            - .:/app
        # This is the image name that will be generated during the build
        # DOCKER_IMAGE_MD5 is generated at runtime by the the CI
        # It is used for pulling and pushing
        image: ${DOCKER_IMAGE}
        build:
            context: docker/dev
            # Two images are used for caching
            # 1) A specific image hash to try to build the full image with the cache
            # 2) The last CI build to try reusing as much cache as possible when
            #    a brand new image needs to be built
            cache_from:
                - ${DOCKER_IMAGE}
                - ${DOCKER_IMAGE_CI}

Here is the CircleCI config :

version: 2

references:
    # Use CircleCI machine executor in order to use volumes
    configure_base: &configure_base
        machine:
            enabled: true
            image: circleci/classic:201711-01

    # This is the task which generate a md5 hash of the content of the docker directory
    # Basically it generates a md5 of all files in the "docker" directory, then put them
    # in a file called "docker-ci.md5" and finally generates a md5 of this file.
    generate_docker_hashes: &generate_docker_hashes
        run: |
            test -e docker-ci.md5 || find docker -type f -exec md5sum {} \; | sort -k 2 > docker-ci.md5
            echo 'export DOCKER_IMAGE_MD5=$(md5sum docker-ci.md5 | cut -f1 -d" ")' >> $BASH_ENV
            echo 'export DOCKER_IMAGE=example/docker-caching:$DOCKER_IMAGE_MD5' >> $BASH_ENV
            echo 'export DOCKER_IMAGE_CI=example/docker-caching:ci' >> $BASH_ENV

    authenticate_on_registry: &authenticate_on_registry
        run: docker login -u $DOCKER_LOGIN -p $DOCKER_PASSWORD

jobs:
    build:
        <<: *configure_base
        steps:
            - checkout

            - *authenticate_on_registry
            - *generate_docker_hashes

            - run: |
              set -x

          # Try to pull, then build a new image if it is needed
              docker pull $DOCKER_IMAGE_CI || true
              docker-compose pull --ignore-pull-failures --parallel
              docker-compose build

          # Build is done. We tag the (perhaps new) image with "ci" (ie. the last built version) and push.
              docker tag $DOCKER_IMAGE $DOCKER_IMAGE_CI
              docker-compose push

    unit-test:
        <<: *configure_base
        steps:
            - *generate_docker_hashes
            - *authenticate_on_registry

            # At this point, the pull must be successful because we just pushed that hash
            - run: docker-compose pull --parallel
            # Run tests now as you would do on your machine

workflows:
    version: 2
    build_test_deploy:
        jobs:
          - build

          - tests:
              requires:
                - build

Conclusion

That’s it. We just see how to build docker images pragmatically in the CI for your tests.

The major downside is that the CI would still need more time to execute than a CI without docker, because yes, it’s an overhead.

The big benefit is that you can reuse any known environment (I.e. dev, prod, etc.) in your CI for your tests or anything. So you would be almost sure that what is working on your machine would work the same way in your CI.

I recently came across this problem. I was trying to add Selenium tests to the CI and it was a real pain. Then I switch to this approach and everything was smooth, almost :-)