Docker performance on Azure Pipelines agents
I’ve noticed some interesting variations in build times on Azure Pipelines related to using Docker to build and/or run containers. I tracked down the issue to excessive time downloading docker images, so I started investigating whether there’s ways to cache or optimise the
docker pull steps.
So what to do?
First off, if you can run your own build agent you’re probably not going to see these problems. Having a dedicated agent means that Docker caches all images locally and can reuse them for subsequent build jobs. That’s issue with Microsoft-hosted agents - you get a brand new agent for each job - there’s no possibility to persist any changes, so the value of Docker’s image caching is reduced.
First off I did some research into whether the Cache Task could be used as a way to more efficiently restore the Docker image cache. I think the answer is probably ‘no’. I’ve started using this task for another build involving NuGet packages, and in that case it makes the
dotnet restore step much faster, but I think the problem with Docker is the files are stored in various places, so trying to cache
C:\ProgramData\Docker\windowsfilter didn’t seem to have any effect.
There’s a GitHub issue with some discussion about using
load and I can confirm the comments on the issue that this did not make things faster (in fact it made things slower).
Here’s an example - a pipeline that’s saving 4.8-windowsservercore-ltsc2019 in the cache.
pool: vmImage: 'windows-latest' variables: dockerCache: $(Build.ArtifactStagingDirectory)\.dockercache steps: - script: docker images --digests displayName: Docker images - task: [email protected] inputs: key: 'nuget | "$(Agent.OS)" | azure-pipelines-docker-cache.yml' path: '$(dockerCache)' cacheHitVar: DOCKER_CACHE_HIT displayName: 1. Cache Task - script: | if exist $(dockerCache)\4.8-windowsservercore-ltsc2019.tar docker load -i $(dockerCache)/4.8-windowsservercore-ltsc2019.tar displayName: 2. Docker Load - script: docker pull mcr.microsoft.com/dotnet/framework/aspnet:4.8-windowsservercore-ltsc2019 displayName: 3. Docker Pull - script: | if not exist $(dockerCache) mkdir $(dockerCache) docker image save -o $(dockerCache)/4.8-windowsservercore-ltsc2019.tar mcr.microsoft.com/dotnet/framework/aspnet:4.8-windowsservercore-ltsc2019 condition: ne(variables['DOCKER_CACHE_HIT'], 'true') displayName: 4. Docker Save
|Build||1. Cache Task||2. Docker Load||3. Docker Pull||4. Docker Save||5. Cache Save||Job time (total)|
|Build 1 (cache miss)||00:00:03||00:00:01||00:08:04||00:17:06||00:06:09||00:31:50|
|Build 2 (cache hit)||00:05:45||00:10:56||00:00:03||00:00:01||00:00:03||00:17:15|
So yes, the second build was faster, but both of these are way slower than a build that had no caching at all (cf just the first
docker pull at 00:08:04). So that’s no help.
The issue also mentions using
buildctl, which is part of BuildKit. The trouble is I’m working with Windows Containers and BuildKit currently supported with those. If you’re working with Linux containers, this does sound promising.
Existing images on the agent
It occurred to me that shouldn’t Docker be making use of some existing images that are shipped on the hosted agent. The software and tools pre-installed on each agent is documented. For example, the Windows 2019 agent. This is not a static list, the agents will be updated over time as patches and updates are issued for both the OS and applications.
You can see the list of pre-installed images listed, or just to confirm, you can run
docker images --digests in a pipeline step to confirm. Here’s the output I got which matches the documentation.
REPOSITORY TAG DIGEST IMAGE ID CREATED SIZE mcr.microsoft.com/dotnet/framework/aspnet 4.8-windowsservercore-ltsc2019 sha256:dbf97206264133cdef6b49b06fa5d4028482845547c2858a086b5ce5c4513f00 8280f73a9be1 9 days ago 6.87GB mcr.microsoft.com/dotnet/framework/runtime 4.8-windowsservercore-ltsc2019 sha256:bf47599181ae3877ec680428a99f76d43ffb26251155a6f0b0b76f4e70304c26 bcd511658148 9 days ago 6.51GB mcr.microsoft.com/windows/servercore ltsc2019 sha256:2629881183feda906459163cb58fbdbc001bea76a92b2dc4695c8e5b14f747ae 561b89eac394 2 weeks ago 3.7GB mcr.microsoft.com/windows/nanoserver 1809 sha256:8e6807c213b52405fec8a861e0b766055ba9d4f941267adf49ee67526755b63a 9e7d556b2b51 2 weeks ago 251MB microsoft/aspnetcore-build 1.0-2.0 sha256:9ecc7c5a8a7a11dca5f08c860165646cb30d084606360a3a72b9cbe447241c0c 5d8be0910d37 21 months ago 3.99GB
Out of curiosity, I added a
docker pull mcr.microsoft.com/dotnet/framework/aspnet:4.8-windowsservercore-ltsc2019 step to the pipeline. Now that should be super-quick as you can see that image is already cached. But it wasn’t! It too almost 8 minutes.
docker pull mcr.microsoft.com/dotnet/framework/aspnet:4.8-windowsservercore-ltsc2019 ========================== Starting Command Output =========================== "C:\windows\system32\cmd.exe" /D /E:ON /V:OFF /S /C "CALL "D:\a\_temp\d190c5d8-262b-4a70-9c04-216b9ac2b165.cmd"" 4.8-windowsservercore-ltsc2019: Pulling from dotnet/framework/aspnet 4612f6d0b889: Already exists eed17b4baac2: Pulling fs layer 565c587c68c2: Pulling fs layer c732b140f2ad: Pulling fs layer 84ae672f9921: Pulling fs layer cd04865d4563: Pulling fs layer 7c75100d3a4d: Pulling fs layer bea74093ac0e: Pulling fs layer 6353217bf85b: Pulling fs layer ca397bdd5ee0: Pulling fs layer ef8702482a58: Pulling fs layer 84ae672f9921: Waiting cd04865d4563: Waiting 7c75100d3a4d: Waiting bea74093ac0e: Waiting 6353217bf85b: Waiting ca397bdd5ee0: Waiting ef8702482a58: Waiting c732b140f2ad: Verifying Checksum c732b140f2ad: Download complete 565c587c68c2: Verifying Checksum 565c587c68c2: Download complete eed17b4baac2: Verifying Checksum eed17b4baac2: Download complete cd04865d4563: Verifying Checksum cd04865d4563: Download complete 7c75100d3a4d: Verifying Checksum 7c75100d3a4d: Download complete 6353217bf85b: Verifying Checksum 6353217bf85b: Download complete ca397bdd5ee0: Verifying Checksum ca397bdd5ee0: Download complete 84ae672f9921: Verifying Checksum 84ae672f9921: Download complete ef8702482a58: Verifying Checksum ef8702482a58: Download complete bea74093ac0e: Verifying Checksum bea74093ac0e: Download complete eed17b4baac2: Pull complete 565c587c68c2: Pull complete c732b140f2ad: Pull complete 84ae672f9921: Pull complete cd04865d4563: Pull complete 7c75100d3a4d: Pull complete bea74093ac0e: Pull complete 6353217bf85b: Pull complete ca397bdd5ee0: Pull complete ef8702482a58: Pull complete Digest: sha256:3579480a92f0795c37d6e551139b431eb7cafe798d257c7ce279e10adbd0cb6d Status: Downloaded newer image for mcr.microsoft.com/dotnet/framework/aspnet:4.8-windowsservercore-ltsc2019 mcr.microsoft.com/dotnet/framework/aspnet:4.8-windowsservercore-ltsc2019
Why is it pulling all those layers? When does
I then took a look at the Docker Hub page for ASP.NET. It lists the same tag, though interestingly the ‘last modified’ date was 19th of May (4 days ago). Compare that with the
docker images list above - it says ‘9 days ago’ - and on closer examination the sha256 values are different too!
So I’m pretty sure that’s the problem - there’s a lag between when a new image is published on Docker Hub and when that image will be included in the current hosted agent VM.
One other thing to watch out for. Notice that the one of the images listed on the agent is mcr.microsoft.com/windows/servercore:ltsc2019? There’s different ‘architecture’ options for container images. For Windows Containers, these are usualy either ‘multiarch’ or ‘amd64’. For example see both listed for Windows Server Core. The trap is that these are two different images. If you specify the
ltsc2019-amd64 tag, that won’t match the image on the agent.
So that seems like a reasonable hypothesis. Because we’re either explictly doing a
docker pull or we’re depending on images that were built with a different version of the base image, we’re experiencing a cache miss and paying the penalty by needing to download an entirely new image.
I think the problem can be managed by ensuring that images are built against the current base images on the agent. If you’re building images in different pipelines and then storing those in a private registry (Azure Container Registry for example), then you’re probably going to need to refresh those as soon as the agent images are updated.
Following the releases in the GitHub Actions virtual-environments repo appears to be the easiest way to know when the agent software is changing. Yes, GitHub Actions and Azure Pipelines share the same agent configurations.
If you really need to fix on a version, don’t just rely on the tags - you’re best bet then is to reference the sha256. That way there’s no abiguity. But be aware you’ll more than likely end up referring to an image that isn’t cached. In that case if build time matters, then using a self-hosted agent is probably the best strategy.