Laszlo

Hello, I am Laszlo

Software-Enginner, .NET developer

Contact Me

Span on Garbage Collection

A Span represents a continuous array of memory. As it is implemented by a ref struct the compiler makes sure that it does not escape to the heap (in .NET9). A Span may point to multiple types of memory segments, for example, it can point to an unmanaged memory segment, stack allocated memory or heap allocated memory. Span and ReadOnlySpan types are using an interior pointer which allows them to point to an address that is not necessarily the object's MT, but an address inside the object's memory representation. For example, they can point to the nth element of an array.

From the garbage collector's point of view, the interior pointers need special handling: the interior pointer must be translated to an address that points to the corresponding object's MT so it can be 'marked' as used memory. This is needed as an otherwise unrooted object would get garbage collected. The GC uses the brick table for the address translation.

As a ref struct type lives on the stack, it shall not cause additional allocations or pressure on the GC. Yet the address translation is extra work that the GC needs to do. The design decision for these types to be a ref struct is driven by the additional work required for the address translation. This way the GC does not need to handle interior pointers within heap allocated objects.

Does address translation have a measurable impact on garbage collection?

Measurement

For this measurement I set up a simple experiment: I allocated an array of bytes and implemented two recursive methods that slice into the given array segment. The last recursion waits for a predefined amount of time. This prevents the stack unrolling for the time of the measurement. One of the methods accepts a ReadOnlySpan<byte> as the input, while the other method has an input parameter of a ReadOnlyMemory<byte>:

[MethodImpl(MethodImplOptions.NoOptimization)]
public static int ROS(ReadOnlySpan<byte> target)
{
    if (target.Length == 0)
    {
        Thread.Sleep(WAIT_TIME);
        return 0;
    }
    var nextTaget = target.Slice(1);
    var result = target[0] + ROS(nextTaget);
    return result;
}

The number of recursions is driven by the input array, which is the same size in both cases. The methods are attributed with [MethodImpl(MethodImplOptions.NoOptimization)] to avoid tail recursion optimizations.

Note that, recursion is used as List<T>, arrays, or inline arrays may not be able to represent and handle a collection of ReadOnlySpan<T>s.

This method is invoked on 100 threads that results 100 * 10_000 spans (10_000 is the size of the input array). When all threads are prepared and waiting at the last element processed, a separate thread starts to allocate arbitrary (unrooted) objects and invokes GC.Collect(2, GCCollectionMode.Forced); which results in a non-concurrent induced full garbage collections (the initial array is rooted in Gen2). This thread executed a loop with 100 iterations, invoking the GC 100 times. I used PerfView to measure the GC pause times during this period.

Results

I repeated the measurement with 50, 100 and 200 threads. The results show an increased Total GC pause time in all measurements.

GC Rollup By Generation, all times are in msec. Type ROS represents the ReadOnlySpan and ROM the ReadOnlyMemory usage.

50 threads:

Type

Gen

Count

Max Pause

Max Peak MB

Max Alloc MB/sec

Total Pause

Total Alloc MB

Alloc MB/MSec GC

Survived MB/MSec GC

Mean Pause

Induced

ROM

2

101

287,8

1,3

3.615,904

12.468,4

6,1

0,0

0,003

123,4

100

ROS

2

100

261,8

1,3

3.615,904

12.949,3

5,9

0,0

0,003

129,5

100

100 threads:

Type

Gen

Count

Max Pause

Max Peak MB

Max Alloc MB/sec

Total Pause

Total Alloc MB

Alloc MB/MSec GC

Survived MB/MSec GC

Mean Pause

Induced

ROM

2

100

417,9

1,7

2.174,783

23.880,2

5,9

0,0

0,002

238,8

100

ROS

2

100

432,6

1,7

2.344,688

26.091,6

5,9

0,0

0,002

260,9

100

200 threads:

Type

Gen

Count

Max Pause

Max Peak MB

Max Alloc MB/sec

Total Pause

Total Alloc MB

Alloc MB/MSec GC

Survived MB/MSec GC

Mean Pause

Induced

ROM

2

100

886,8

2,6

1.818,909

45.657,4

6,1

0,0

0,001

456,6

100

ROS

2

100

924,2

2,6

1.719,885

53.250,0

6,1

0,0

0,001

532,5

100

Ignore the Max Pause as it represents a single outlier in the sample in all measurements. In all measurements both the Total Pause and Mean Pause columns indicate that the GC used more CPU time in case of ROS.

Conclusion

The measurements demonstrate that using ReadOnlySpan<T> has a measurable impact on garbage collection performance compared to ReadOnlyMemory<T>. In scenarios with high thread counts and numerous interior pointers on the stack, the GC spends additional time translating these pointers to their corresponding object addresses.

While these results shouldn't discourage the use of Span<T> and ReadOnlySpan<T> in general scenarios, they suggest that in specific high-scale situations where many threads maintain deep call stacks with numerous spans, the GC overhead from interior pointer handling should be considered.