Scaling ASP.NET Core Minimal API Responses
09/07/2025
5 minutes
I have been running an ASP.NET Core application on Raspberry Pi Zero 2W. In an early performance test, I have deployed an application with two endpoints: /
and /data
. These endpoints return a simple string
response as shown below.
app.MapGet("/", () => { return "Hello World"; }); app.MapGet("/data", (HttpContext ctx) => { ctx.Response.StatusCode = 200; var buffer = ctx.Response.BodyWriter.GetSpan(11); "Hello World"u8.CopyTo(buffer); ctx.Response.BodyWriter.Advance(11); });
The application is compiled for .NET 9, without enabling native AOT. I have measured the performance of these APIs using CHttp tool.
CHttp is a simple tool to test performance of HTTP endpoints.
The http request used to compare the performance is shown below. In this case it tests the /data
endpoint:
# @clientsCount 20
# @requestCount 10000
GET http://raspberrypi/data HTTP/2
Similar request can be written for the /
(root) endpoint by dropping the path segment. This test sends a total of 10000 requests from 20 different clients. The Raspberry PI and the test laptop are on the same network segment, with no other traffic mixed.
Results
The results show a performance difference between the two endpoints. Here below is the /
root endpoint:
RequestCount: 10000, Clients: 20, Connections: 20 | Mean: 20,794 ms | | StdDev: 13,028 ms | | Error: 130,284 us | | Median: 17,086 ms | | Min: 4,526 ms | | Max: 158,363 ms | | 95th: 47,054 ms | | Throughput: 109.213 MB/s | | Req/Sec: 960 |
The /data
endpoint:
RequestCount: 10000, Clients: 20, Connections: 20 | Mean: 10,021 ms | | StdDev: 4,706 ms | | Error: 47,061 us | | Median: 8,805 ms | | Min: 3,638 ms | | Max: 65,352 ms | | 95th: 18,108 ms | | Throughput: 286.690 MB/s | | Req/Sec: 1,99E+03 |
When the total requests is reduced to 100, the Mean value shows ~10ms for both endpoints. That means the two solution roughly execute the same amount of the time, but /data
scales better compared to /
endpoint. The /
endpoint uses significantly more resources which limits the number of requests it can scale to.
What consumes resources?
ASP.NET Core
One can start the investigation with ASP.NET Core. Does ASP.NET Core handle one solution in a significantly different way to the other? Let's review what ASP.NET Cores does in the general case. I am focusing on the way the application returns the response as the way the incoming request should be identical (other than the path processing).
Requests (and the server) are explicitly using HTTP2. ASP.NET Core's H2 pipeline executes the following actions:
RequestDelegateFactory
returns from the application code to the server code to return the response. There is an immediate difference, the /
endpoint sets the content-type header.
The remainder of the 'pipeline' takes the same/similar path. Numerous WriteAsync
methods write the response to the network stream:
HttpResponseBodyFeature.StartAsync
is invoked (this feature is implemented by HttpProtocol class, which is the base type of Http2Stream.HttpProtocol
performs housekeeping actions.Initializes the response, makes sure the server and date headers are set.
Marks the response headers read-only.
Writes the response status code, headers to the output formatter (this is an oversimplified step).
Sends the response data to the output stream.
Flushes the body writer and closes the stream.
Writing the Body
However, there is a key difference after all: the /
endpoint needs to UTF8 encode the data. This is done by HttpResponseWritingExtensions
. which is invoked by RequestDelegateFactory
. While this code is fairly simple. In the case of 'Hello World' response it executes a fast path, which means the whole value is written in a single segment. The structure of the code resembles the structure of the /data
endpoint's implementation. Gets a span of memory where the response can be written, copies the data and advances the BodyWriter
. There is a key difference though:
in case of the
/
endpoint thestring
needs to be encoded to UTF-8./data
endpoint usesu8
string, that is calculated at compile time. This is visible in the IL code:
IL_000e: ldsflda valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=12' '<PrivateImplementationDetails>'::'80EEBAA90073956F594F9A279455C6971A3B260C68821A12C8260B984C3496CC'
IL_0013: ldc.i4.s 11
IL_0015: newobj instance void valuetype [System.Runtime]System.ReadOnlySpan`1<uint8>::.ctor(void*, int32)
IL_001a: stloc.1
The address of 80EEBAA90073956F594F9A279455C6971A3B260C68821A12C8260B984C3496CC
points to:
.field assembly static initonly valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=12' '80EEBAA90073956F594F9A279455C6971A3B260C68821A12C8260B984C3496CC' at I_000030F0
.data cil I_000030F0 = bytearray (
48 65 6c 6c 6f 20 57 6f 72 6c 64 00
)
Notice the value 48 65 6c ...
is the UTF-8 encoded "Hello World"
string. The same string copy can be observed in that actual assembly instructions at runtime.
A microbenchmark comparing the way data is written to the response body (copy vs. UTF-8 encoding) shows the performance difference. This benchmark is executed using BenchmarkDotNet library on a laptop:
| Method | Mean | Error | StdDev | Code Size | Allocated | |------------ |----------:|----------:|----------:|----------:|----------:| | AspNetWrite | 7.4462 ns | 0.1019 ns | 0.0851 ns | 2,211 B | - | | U8Write | 0.4456 ns | 0.0133 ns | 0.0111 ns | 170 B | - |
The two main differences: writing the u8
code is nearly free as being only a memory copy (mov
s). At the same AspNetWrite
requires the encoding to happen at every request, which not only results larger code size, but also precious CPU cycles to execute.