I was working on some code recently. It had some slow bits that I knew about, but I was really surprised when I noticed the log entries before and after a single line were 8 minutes apart!
The line was similar to this:
var matching = source.Where(x => other.Contains(x.Key)).ToList();
source is a
List<T> of custom class that includes a string property
other is type
The line seems pretty innocuous. The only thing I could think of was that both lists contained quite a few items. I reasoned that in that case the
Contains() may not be super efficient, as in the worse case it would have to compare every item in
other list to the
Key value only to find no matches. Maybe an alternative data structure to
List<T> might perform better?
If the list was sorted, then I figured that should make it it faster for searching, so I tried using
SortedSet<T>. Being a ‘Set’ based type, it also eliminates duplicate values, so that could help. The next run, that code only took 4 seconds. Awesome, let’s ship it!
Such a simple thing, but hey wouldn’t that make a good blog post. Why yes I think it would! But to make a decent post I think I need a better comparison than 8 minutes vs 4 seconds. So I created a simple reproduction of the scenario and made use of BenchmarkDotNet to get some statistics.
In building the benchmark code, I thought it would not only be interesting to compare performance over the 3 most recent .NET runtimes, but also to add one more collection type
HashSet<T> to the mix (and what a revelation that was!)
The source code for the benchmarks can be found at https://github.com/flcdrg/list-performance.
Here’s the raw results:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1288 (21H1/May2021Update) 11th Gen Intel Core i7-1185G7 3.00GHz, 1 CPU, 8 logical and 4 physical cores .NET SDK=6.0.100-rc.2.21505.57 [Host] : .NET 6.0.0 (22.214.171.124005), X64 RyuJIT .NET 5.0 : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT .NET 6.0 : .NET 6.0.0 (126.96.36.199005), X64 RyuJIT .NET Core 3.1 : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT
|AList||.NET 5.0||.NET 5.0||889,075.6 µs||14,010.42 µs||13,105.35 µs|
|ASortedSet||.NET 5.0||.NET 5.0||13,352.2 µs||211.94 µs||187.88 µs|
|AHashSet||.NET 5.0||.NET 5.0||788.3 µs||6.43 µs||6.01 µs|
|AList||.NET 6.0||.NET 6.0||691,480.0 µs||10,067.18 µs||9,416.85 µs|
|ASortedSet||.NET 6.0||.NET 6.0||12,775.4 µs||192.49 µs||180.05 µs|
|AHashSet||.NET 6.0||.NET 6.0||794.5 µs||6.19 µs||5.79 µs|
|AList||.NET Core 3.1||.NET Core 3.1||943,716.1 µs||6,523.18 µs||6,101.79 µs|
|ASortedSet||.NET Core 3.1||.NET Core 3.1||12,686.2 µs||143.02 µs||133.78 µs|
|AHashSet||.NET Core 3.1||.NET Core 3.1||957.7 µs||8.09 µs||7.57 µs|
I copied the data into Excel to generate some graphs.
First off, let’s compare the three collection types across a single runtime (.NET 6 RC.2)
List<T> is the worst, by a lot. So much that it doesn’t fit that well on the chart!
SortedSet<T> was a better choice, but look -
HashSet<T> beats them both comfortably.
HashSet<T> would have been a better choice. I’ll remember that for next time.
But while we’re here, let’s use this as a useful glimpse into how collection performance has (mostly) improved over time.
List<T> has got better from .NET Core 3.1 to .NET 5 and now .NET 6
HashSet<T> made improvements from 3.1 to 5, with a tiny (possibly insignificant) regression for 6.
The curious one is
SortedSet<T>. It got worse in .NET 5, but clawed back some ground in .NET 6, though not quite back to 3.1 levels.
As to what’s behind the performance changes, a comparison of the source code for each type and reviewing any changes between .NET versions would give some clues - if the changes were insignificant then that does hint at underlying runtime improvements being the cause.
List<T> is such a useful, general purpose collection type. I use it a lot. But after seeing this I’ll keep in mind that it is worth checking performance and that if necessary there are more specialised types that can perform a lot better if they’re able to be used.