• The fastest Contains() in .NET

    I was working on some code recently. It had some slow bits that I knew about, but I was really surprised when I noticed the log entries before and after a single line were 8 minutes apart!

    The line was similar to this:

    var matching = source.Where(x => other.Contains(x.Key)).ToList();
    

    Where source is a List<T> of custom class that includes a string property Key, and other is type List<string>.

    The line seems pretty innocuous. The only thing I could think of was that both lists contained quite a few items. I reasoned that in that case the Contains() may not be super efficient, as in the worse case it would have to compare every item in other list to the Key value only to find no matches. Maybe an alternative data structure to List<T> might perform better?

    If the list was sorted, then I figured that should make it it faster for searching, so I tried using SortedSet<T>. Being a 'Set' based type, it also eliminates duplicate values, so that could help. The next run, that code only took 4 seconds. Awesome, let's ship it!

    Such a simple thing, but hey wouldn't that make a good blog post. Why yes I think it would! But to make a decent post I think I need a better comparison than 8 minutes vs 4 seconds. So I created a simple reproduction of the scenario and made use of BenchmarkDotNet to get some statistics.

    In building the benchmark code, I thought it would not only be interesting to compare performance over the 3 most recent .NET runtimes, but also to add one more collection type HashSet<T> to the mix (and what a revelation that was!)

    The source code for the benchmarks can be found at https://github.com/flcdrg/list-performance.

    Here's the raw results:

    BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1288 (21H1/May2021Update)
    11th Gen Intel Core i7-1185G7 3.00GHz, 1 CPU, 8 logical and 4 physical cores
    .NET SDK=6.0.100-rc.2.21505.57
      [Host]        : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT
      .NET 5.0      : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT
      .NET 6.0      : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT
      .NET Core 3.1 : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT
    
    Method Job Runtime Mean Error StdDev
    AList .NET 5.0 .NET 5.0 889,075.6 µs 14,010.42 µs 13,105.35 µs
    ASortedSet .NET 5.0 .NET 5.0 13,352.2 µs 211.94 µs 187.88 µs
    AHashSet .NET 5.0 .NET 5.0 788.3 µs 6.43 µs 6.01 µs
    AList .NET 6.0 .NET 6.0 691,480.0 µs 10,067.18 µs 9,416.85 µs
    ASortedSet .NET 6.0 .NET 6.0 12,775.4 µs 192.49 µs 180.05 µs
    AHashSet .NET 6.0 .NET 6.0 794.5 µs 6.19 µs 5.79 µs
    AList .NET Core 3.1 .NET Core 3.1 943,716.1 µs 6,523.18 µs 6,101.79 µs
    ASortedSet .NET Core 3.1 .NET Core 3.1 12,686.2 µs 143.02 µs 133.78 µs
    AHashSet .NET Core 3.1 .NET Core 3.1 957.7 µs 8.09 µs 7.57 µs

    I copied the data into Excel to generate some graphs.

    First off, let's compare the three collection types across a single runtime (.NET 6 RC.2)

    Chart comparing all three collection types running on .NET 6 RC2

    So List<T> is the worst, by a lot. So much that it doesn't fit that well on the chart! SortedSet<T> was a better choice, but look - HashSet<T> beats them both comfortably.

    Ok, so HashSet<T> would have been a better choice. I'll remember that for next time.

    But while we're here, let's use this as a useful glimpse into how collection performance has (mostly) improved over time.

    List<T> has got better from .NET Core 3.1 to .NET 5 and now .NET 6

    Chart comparing List<T> on the three runtimes

    Likewise, HashSet<T> made improvements from 3.1 to 5, with a tiny (possibly insignificant) regression for 6.

    Chart comparing HashSet<T> on the three runtimes

    The curious one is SortedSet<T>. It got worse in .NET 5, but clawed back some ground in .NET 6, though not quite back to 3.1 levels.

    Chart comparing SortedSet<T> on the three runtimes

    As to what's behind the performance changes, a comparison of the source code for each type and reviewing any changes between .NET versions would give some clues - if the changes were insignificant then that does hint at underlying runtime improvements being the cause.

    List<T> is such a useful, general purpose collection type. I use it a lot. But after seeing this I'll keep in mind that it is worth checking performance and that if necessary there are more specialised types that can perform a lot better if they're able to be used.

  • Azure Resource Namer tool

    If you host anything in Azure, then you've probably had to think about resource naming conventions, even if your convention is not to have one! But any time you start building up a sizable collection of resources, having a naming convention can make it a lot easier to manage.

    Microsoft have a suggested convention as part of their Cloud Adoption Framework. It's pretty straightforward, and makes use of their list of recommended abbreviations for resource types.

    I thought I'd have a go making a simple tool that could automate generating the name. There's lots of ways I could have built such a tool, but I figured a simple web page should be achievable and make it accessible to most people.

    Most of my time is spent working on back-end code, so here was a chance to play around with a frontend framework. I could have chosen one of the big names, but there's one I've been keeping a keen eye on for a long time that I hadn't ever actually used - Aurelia - and this seemed like a nice opportunity to kick the tyres. I was really pleased with the result. This is not a complicated web application, but I do like the way Aurelia works. I used Aurelia v1, but I'll be keen to upgrade to v2 (it's currently in alpha).

    You can try out the tool at https://david.gardiner.net.au/azure-resource-namer/, with the source code at https://github.com/flcdrg/azure-resource-namer. I made use of GitHub Pages for the publishing part. Later on I might get a custom domain and maybe run it with something like Azure Static Web Apps, but this does the job for now.

    Screenshot

    The naming convention consists of concatenating a number of fields together:

    • The Resource Type and Region fields are drop-down lists.
    • The Workload and Environment fields are user-editable. The type of resource will determine what characters are allowed here.
    • The Instance field is numeric. If you set it to zero then it will be excluded.

    There are a lot of different resource types. The tool supports all the types listed in the previously mentioned abbreviations page. I'm slowly adding extra validation information for each resource, so that you get extra hints should you use an invalid character, the name ends up being too long, and also where there are particular formatting requirements (eg storage accounts).

    Screenshot of invalid character

    Screenshot of invalid too long

    The resource types are listed in resources.ts. For example here's how the rules and information for the 'Resource Group' resource type has been defined:

    {
        abbrev: 'rg',
        name: 'Resource group',
        minLength: 1,
        maxLength: 90,
        // https://docs.microsoft.com/en-us/rest/api/resources/resource-groups/create-or-update#uri-parameters
        regex: /^[-\w\._\(\)]+$/,
        description: 'Alphanumerics, underscores, parentheses, hyphens, periods, and unicode characters that match the regex documentation. Can\'t end with period.'
    }
    

    Once I've got all the resource type rules done, there's probably scope for more customisation options and other enhancements. I've already had one of my SixPivot colleagues Dylan contribute the 'copy to clipboard' button.

    Give it a try and let me know what you think in the comments. If you've got some ideas or would like to contribute additional features, head over to the repo!

  • STEPtember 2021

    STEPtember logo

    It's the last day of September and also the last day of "STEPtember"!

    STEPtember is a fundraiser to support people with, and research into cerebral palsy. Participants are asked to take at least 10,000 steps each day of the month of September.

    I found out about STEPtember through my colleagues at SixPivot, and signed up with a team to participate this year. Since I've been working from home I do try and take a regular morning walk before starting work - but I found that I needed to do a bit more than that (like cycling, gardening or other kinds of exercise) to make it over the 10,000 threshold. I do hope the extra steps are benefiting my health, as well as any donations going to a great cause.

    If you'd like to sponsor me to help me get to my individual fundraising target, head over to my STEPtember profile page (donations are tax deductible for Australians).