Laszlo - Async Task Closures

Async Task Closures

10/26/2025 | 5 minutes to read

In .NET async methods get compiled to an async state machine. When awaiting a method call returning a Task the state of the current method is captured by a compiler generated value type, that also implements IAsyncStateMachine.

In this blog post I use .NET 9 to explore some internals of this behavior.

Capturing Structs

I have recently encountered one such async method in production code. The method received a few input parameters, created a struct instance populating its properties with the input parameters. Then serialized the struct instance to string and sent an HTTP POST request using HttpClient while awaiting the result.

This method is very likely complete asynchronously, due to the network call. However, the question raised: is the struct instance a good choice for the serialization or would the code perform better with a class?

To further clarify the question: does the async state machine capture the struct instance or not - and if it would capture it, does it influence the performance of the application?

To investigate, let's create a method that mimics the above-described method, but without making an actual HTTP call. In WorkNo method the input arguments are generated using random values, Data type represents the struct with a string Name property and an int Id property.

public async Task<int> WorkNo()
{
    var d = new Data();
    d.Name = "Test";
    d.Id = Random.Shared.Next();
    var s = JsonSerializer.Serialize(d);
    var r = await Do(s);
    return r;
}

The method Do yields and returns the input string's length. This is the substitution for the HttpClient's SendAsync method:

[MethodImpl(MethodImplOptions.NoInlining)]
async Task<int> Do(string parameter)
{
    await Task.Yield();
    return parameter.Length;
}

I also added a WorkDto method for the comparison. The difference here is that it returns the sum of the Do method result and the Data instance's Id property:

    // ...same as WorkNo
    return r + d.Id;
}

When looking at the generated MSIL, it reveals the internals of the generated state machine. I used ILSpy to peek into the generated code. There is a value type generated for WorkNo method's and one for WorkDto's state machine. While they look very similar, there is a key difference: the code generated for WorkDto contains (and populates/reads) an additional field .field private valuetype Data '<d>5__2':

	// Fields
	.field public int32 '<>1__state'
	.field public valuetype [System.Runtime]System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1<int32> '<>t__builder'
	.custom instance void [System.Runtime]System.Runtime.CompilerServices.NullableAttribute::.ctor(uint8) = (
		01 00 00 00 00
	)
	.field public class Benchmarks '<>4__this'
	.custom instance void [System.Runtime]System.Runtime.CompilerServices.NullableAttribute::.ctor(uint8) = (
		01 00 00 00 00
	)
	.field private valuetype Data '<d>5__2'
	.field private valuetype [System.Runtime]System.Runtime.CompilerServices.TaskAwaiter`1<int32> '<>u__1'
	.custom instance void [System.Runtime]System.Runtime.CompilerServices.NullableAttribute::.ctor(uint8) = (
		01 00 00 00 00
	)

This field captures the Data value type as part of the state machine. That means when the code completes asynchronously, this struct is preserved on the heap. More memory is allocated on the heap and more data is copied onto the heap.

When the struct is not used after the await call, the compiler can omit saving its state. As WorkNo does not use the struct's value, the compiler leaves out the corresponding allocation and copy.

Performance

The MSIL indicates that the WorkNo method can be faster. Does it manifest in better performance results, or does the JIT optimize away the extra costs?

Let's create a microbenchmark to compare the performance. To make sure that the addition operation does not affect the final results, I added a third method for the comparison. WorkInt does the same job as WorkDto, but it only requires the int property to be captured:

public async Task<int> WorkInt()
{
    var d = new Data();
    d.Name = "Test";
    int v = Random.Shared.Next();
    d.Id = v;
    var s = JsonSerializer.Serialize(d);
    var r = await Do(s);
    return r + v;
}

I used BenchmarkDotNet to run the microbenchmarks:

BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.3194)
12th Gen Intel Core i7-1255U, 1 CPU, 12 logical and 10 physical cores
.NET SDK 9.0.103
  [Host]     : .NET 9.0.2 (9.0.225.6610), X64 RyuJIT AVX2
  DefaultJob : .NET 9.0.2 (9.0.225.6610), X64 RyuJIT AVX2


| Method    | Mean       | Error   | StdDev  | Code Size | Gen0   | Allocated |
|---------- |-----------:|--------:|--------:|----------:|-------:|----------:|
| WorkDto   |   969.3 ns | 6.21 ns | 4.85 ns |      78 B | 0.0534 |     344 B |
| WorkNo    |   914.3 ns | 7.76 ns | 6.88 ns |      78 B | 0.0515 |     328 B |
| WorkInt   |   930.0 ns | 5.12 ns | 4.79 ns |      78 B | 0.0515 |     328 B |

The results show that WorkDto method indeed requires more data to be allocated on the heap and runs longer. An interesting finding is that WorkInt method does not require more data to be allocated to WorkNo. WorkInt method's generated state machine captures the int value in a field: .field private int32 '<v>5__2'. However, it seems the additional int property fits within the state machine's struct layout on an x64 for machine. Indeed, replacing the int with a long results a larger allocation by 8 bytes on the heap. Note that an int is 4 bytes, a long is 8 bytes, hence when the struct is laid out in memory, the int version can utilize a better alignment.

async C# Task ILSpy

Hello, I am Laszlo