-
The fastest Contains() in .NET - Part 2
A follow-up to my recent post about The fastest Contains() in .NET.
John Merchant pointed out that there's more detail on the Microsoft Docs site for the Contains() method for each of these types, which includes the Big O notation.
eg.
Method Order List<T>Contains(T) O(n) SortedSet<T>Contains(T) O(log n) HashSet<T>Contains(T) O(1) Sometimes it pays to read the documentation!
-
The fastest Contains() in .NET
I was working on some code recently. It had some slow bits that I knew about, but I was really surprised when I noticed the log entries before and after a single line were 8 minutes apart!
The line was similar to this:
var matching = source.Where(x => other.Contains(x.Key)).ToList();
Where
source
is aList<T>
of custom class that includes a string propertyKey
, andother
is typeList<string>
.The line seems pretty innocuous. The only thing I could think of was that both lists contained quite a few items. I reasoned that in that case the
Contains()
may not be super efficient, as in the worse case it would have to compare every item inother
list to theKey
value only to find no matches. Maybe an alternative data structure toList<T>
might perform better?If the list was sorted, then I figured that should make it it faster for searching, so I tried using
SortedSet<T>
. Being a 'Set' based type, it also eliminates duplicate values, so that could help. The next run, that code only took 4 seconds. Awesome, let's ship it!Such a simple thing, but hey wouldn't that make a good blog post. Why yes I think it would! But to make a decent post I think I need a better comparison than 8 minutes vs 4 seconds. So I created a simple reproduction of the scenario and made use of BenchmarkDotNet to get some statistics.
In building the benchmark code, I thought it would not only be interesting to compare performance over the 3 most recent .NET runtimes, but also to add one more collection type
HashSet<T>
to the mix (and what a revelation that was!)The source code for the benchmarks can be found at https://github.com/flcdrg/list-performance.
Here's the raw results:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1288 (21H1/May2021Update) 11th Gen Intel Core i7-1185G7 3.00GHz, 1 CPU, 8 logical and 4 physical cores .NET SDK=6.0.100-rc.2.21505.57 [Host] : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT .NET 5.0 : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT .NET 6.0 : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT .NET Core 3.1 : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT
Method Job Runtime Mean Error StdDev AList .NET 5.0 .NET 5.0 889,075.6 µs 14,010.42 µs 13,105.35 µs ASortedSet .NET 5.0 .NET 5.0 13,352.2 µs 211.94 µs 187.88 µs AHashSet .NET 5.0 .NET 5.0 788.3 µs 6.43 µs 6.01 µs AList .NET 6.0 .NET 6.0 691,480.0 µs 10,067.18 µs 9,416.85 µs ASortedSet .NET 6.0 .NET 6.0 12,775.4 µs 192.49 µs 180.05 µs AHashSet .NET 6.0 .NET 6.0 794.5 µs 6.19 µs 5.79 µs AList .NET Core 3.1 .NET Core 3.1 943,716.1 µs 6,523.18 µs 6,101.79 µs ASortedSet .NET Core 3.1 .NET Core 3.1 12,686.2 µs 143.02 µs 133.78 µs AHashSet .NET Core 3.1 .NET Core 3.1 957.7 µs 8.09 µs 7.57 µs I copied the data into Excel to generate some graphs.
First off, let's compare the three collection types across a single runtime (.NET 6 RC.2)
So
List<T>
is the worst, by a lot. So much that it doesn't fit that well on the chart!SortedSet<T>
was a better choice, but look -HashSet<T>
beats them both comfortably.Ok, so
HashSet<T>
would have been a better choice. I'll remember that for next time.But while we're here, let's use this as a useful glimpse into how collection performance has (mostly) improved over time.
List<T>
has got better from .NET Core 3.1 to .NET 5 and now .NET 6Likewise,
HashSet<T>
made improvements from 3.1 to 5, with a tiny (possibly insignificant) regression for 6.The curious one is
SortedSet<T>
. It got worse in .NET 5, but clawed back some ground in .NET 6, though not quite back to 3.1 levels.As to what's behind the performance changes, a comparison of the source code for each type and reviewing any changes between .NET versions would give some clues - if the changes were insignificant then that does hint at underlying runtime improvements being the cause.
List<T>
is such a useful, general purpose collection type. I use it a lot. But after seeing this I'll keep in mind that it is worth checking performance and that if necessary there are more specialised types that can perform a lot better if they're able to be used. -
Azure Resource Namer tool
If you host anything in Azure, then you've probably had to think about resource naming conventions, even if your convention is not to have one! But any time you start building up a sizable collection of resources, having a naming convention can make it a lot easier to manage.
Microsoft have a suggested convention as part of their Cloud Adoption Framework. It's pretty straightforward, and makes use of their list of recommended abbreviations for resource types.
I thought I'd have a go making a simple tool that could automate generating the name. There's lots of ways I could have built such a tool, but I figured a simple web page should be achievable and make it accessible to most people.
Most of my time is spent working on back-end code, so here was a chance to play around with a frontend framework. I could have chosen one of the big names, but there's one I've been keeping a keen eye on for a long time that I hadn't ever actually used - Aurelia - and this seemed like a nice opportunity to kick the tyres. I was really pleased with the result. This is not a complicated web application, but I do like the way Aurelia works. I used Aurelia v1, but I'll be keen to upgrade to v2 (it's currently in alpha).
You can try out the tool at https://david.gardiner.net.au/azure-resource-namer/, with the source code at https://github.com/flcdrg/azure-resource-namer. I made use of GitHub Pages for the publishing part. Later on I might get a custom domain and maybe run it with something like Azure Static Web Apps, but this does the job for now.
The naming convention consists of concatenating a number of fields together:
- The Resource Type and Region fields are drop-down lists.
- The Workload and Environment fields are user-editable. The type of resource will determine what characters are allowed here.
- The Instance field is numeric. If you set it to zero then it will be excluded.
There are a lot of different resource types. The tool supports all the types listed in the previously mentioned abbreviations page. I'm slowly adding extra validation information for each resource, so that you get extra hints should you use an invalid character, the name ends up being too long, and also where there are particular formatting requirements (eg storage accounts).
The resource types are listed in resources.ts. For example here's how the rules and information for the 'Resource Group' resource type has been defined:
{ abbrev: 'rg', name: 'Resource group', minLength: 1, maxLength: 90, // https://docs.microsoft.com/en-us/rest/api/resources/resource-groups/create-or-update#uri-parameters regex: /^[-\w\._\(\)]+$/, description: 'Alphanumerics, underscores, parentheses, hyphens, periods, and unicode characters that match the regex documentation. Can\'t end with period.' }
Once I've got all the resource type rules done, there's probably scope for more customisation options and other enhancements. I've already had one of my SixPivot colleagues Dylan contribute the 'copy to clipboard' button.
Give it a try and let me know what you think in the comments. If you've got some ideas or would like to contribute additional features, head over to the repo!