Laszlo - Span on Garbage Collection

Span on Garbage Collection

02/22/2026 | 4 minutes to read

A Span represents a continuous array of memory. As it is implemented by a ref struct the compiler makes sure that it does not escape to the heap (in .NET9). A Span may point to multiple types of memory segments, for example, it can point to an unmanaged memory segment, stack allocated memory or heap allocated memory. Span and ReadOnlySpan types are using an interior pointer which allows them to point to an address that is not necessarily the object's MT, but an address inside the object's memory representation. For example, they can point to the n^th element of an array.

From the garbage collector's point of view, the interior pointers need special handling: the interior pointer must be translated to an address that points to the corresponding object's MT so it can be 'marked' as used memory. This is needed as an otherwise unrooted object would get garbage collected. The GC uses the brick table for the address translation.

As a ref struct type lives on the stack, it shall not cause additional allocations or pressure on the GC. Yet the address translation is extra work that the GC needs to do. The design decision for these types to be a ref struct is driven by the additional work required for the address translation. This way the GC does not need to handle interior pointers within heap allocated objects.

Does address translation have a measurable impact on garbage collection?

Measurement

For this measurement I set up a simple experiment: I allocated an array of bytes and implemented two recursive methods that slice into the given array segment. The last recursion waits for a predefined amount of time. This prevents the stack unrolling for the time of the measurement. One of the methods accepts a ReadOnlySpan<byte> as the input, while the other method has an input parameter of a ReadOnlyMemory<byte>:

[MethodImpl(MethodImplOptions.NoOptimization)]
public static int ROS(ReadOnlySpan<byte> target)
{
    if (target.Length == 0)
    {
        Thread.Sleep(WAIT_TIME);
        return 0;
    }
    var nextTaget = target.Slice(1);
    var result = target[0] + ROS(nextTaget);
    return result;
}

The number of recursions is driven by the input array, which is the same size in both cases. The methods are attributed with [MethodImpl(MethodImplOptions.NoOptimization)] to avoid tail recursion optimizations.

Note that, recursion is used as List<T>, arrays, or inline arrays may not be able to represent and handle a collection of ReadOnlySpan<T>s.

This method is invoked on 100 threads that results 100 * 10_000 spans (10_000 is the size of the input array). When all threads are prepared and waiting at the last element processed, a separate thread starts to allocate arbitrary (unrooted) objects and invokes GC.Collect(2, GCCollectionMode.Forced); which results in a non-concurrent induced full garbage collections (the initial array is rooted in Gen2). This thread executed a loop with 100 iterations, invoking the GC 100 times. I used PerfView to measure the GC pause times during this period.

Results

I repeated the measurement with 50, 100 and 200 threads. The results show an increased Total GC pause time in all measurements.

GC Rollup By Generation, all times are in msec. Type ROS represents the ReadOnlySpan and ROM the ReadOnlyMemory usage.

50 threads:

Type	Gen	Count	Max Pause	Max Peak MB	Max Alloc MB/sec	Total Pause	Total Alloc MB	Alloc MB/MSec GC	Survived MB/MSec GC	Mean Pause	Induced
ROM	2	101	287,8	1,3	3.615,904	12.468,4	6,1	0,0	0,003	123,4	100
ROS	2	100	261,8	1,3	3.615,904	12.949,3	5,9	0,0	0,003	129,5	100

100 threads:

Type	Gen	Count	Max Pause	Max Peak MB	Max Alloc MB/sec	Total Pause	Total Alloc MB	Alloc MB/MSec GC	Survived MB/MSec GC	Mean Pause	Induced
ROM	2	100	417,9	1,7	2.174,783	23.880,2	5,9	0,0	0,002	238,8	100
ROS	2	100	432,6	1,7	2.344,688	26.091,6	5,9	0,0	0,002	260,9	100

200 threads:

Type	Gen	Count	Max Pause	Max Peak MB	Max Alloc MB/sec	Total Pause	Total Alloc MB	Alloc MB/MSec GC	Survived MB/MSec GC	Mean Pause	Induced
ROM	2	100	886,8	2,6	1.818,909	45.657,4	6,1	0,0	0,001	456,6	100
ROS	2	100	924,2	2,6	1.719,885	53.250,0	6,1	0,0	0,001	532,5	100

Ignore the Max Pause as it represents a single outlier in the sample in all measurements. In all measurements both the Total Pause and Mean Pause columns indicate that the GC used more CPU time in case of ROS.

Conclusion

The measurements demonstrate that using ReadOnlySpan<T> has a measurable impact on garbage collection performance compared to ReadOnlyMemory<T>. In scenarios with high thread counts and numerous interior pointers on the stack, the GC spends additional time translating these pointers to their corresponding object addresses.

While these results shouldn't discourage the use of Span<T> and ReadOnlySpan<T> in general scenarios, they suggest that in specific high-scale situations where many threads maintain deep call stacks with numerous spans, the GC overhead from interior pointer handling should be considered.

performance GC Span perfview

Hello, I am Laszlo

Software-Enginner, .NET developer