Laszlo

Hello, I am Laszlo

Software-Enginner, .NET developer

Contact Me

Division in Assembly

HTTP/3 specification reserves a range of identifiers for streams and frame types. The range as 0x1f * N + 0x21 for non-negative integer values of N.

A received identifier should be validated against the reserved range. This involves subtrackting 33 and then validating if the result is a multiple of 31. The number 31 holds special importance in this context as it is represented by 2N - 1, or 0x0001_1111. This property influences the approaches used for validation.

Several strategies were considered to verify whether a value is a multiple of 31:

  • bit manipulation (summing every 5 bits) value to check if the sum is a multiple of 31.

  • lookup table

  • using divrem Input % 31 == 0

  • using integer division and multiplication (Input / 31) * 31 == Input

  • multiplication and bit shifting (divide by 2).

Find out more »


Task over ValueTask<>

I recently ran into some code that extensively used ValueTask types along with Task. I became curious if it is a good idea to mix async ValueTask<>s and Tasks. Moreover, .NET allows decorating methods returning ValueTask<> types with [AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder<>))] that adds pooling behavior, which reduces allocations.

There are many ways to invoke async methods from another async method. The invoked async method can return a Task, ValueTask<>, or a pooled ValueTask<>; the calling method can also return any of these response types, and be a sync or async method. In this post, I create combinations of these methods to measure their performance footprint. The inner method may complete synchronously, asynchronously, or with a probability set between 0..1, where 1 means synchronous completion.

In this blog post, 'sync' completion refers to returning the result of an underlying async method without awaiting it, or without .Result / .GetAwaiter().GetResult() calls. It does not refer to the 'sync-over-async' anti-pattern.

I used BenchmarkDotNet to measure the performance and allocations of these method combinations. Please note that the allocations show 'non-round' numbers due to BenchmarkDotNet's aggregation when the probability falls between 0 and 1 exclusively. Async completions invoke Task.Yield(); - which yields execution of the current task, allowing other tasks to execute. While there should be no other tasks running in the benchmark, the Mean performance results include a non-trivial waiting duration, that is for the task continuation to execute.

Find out more »


Async Task Closures

In .NET async methods get compiled to an async state machine. When awaiting a method call returning a Task the state of the current method is captured by a compiler generated value type, that also implements IAsyncStateMachine.

In this blog post I use .NET 9 to explore some internals of this behavior.

Capturing Structs

I have recently encountered one such async method in production code. The method received a few input parameters, created a struct instance populating its properties with the input parameters. Then serialized the struct instance to string and sent an HTTP POST request using HttpClient while awaiting the result.

Find out more »


SIMD Gather and Scatter with Contains All ASCII

Introduction

One of the most difficult problems with SIMD is handling non-contiguous memory access. To address this challenge AVX-512 adds gather and scatter instructions to load and store memory in an array at non-adjacent indexes. These instructions enable a whole new set of algorithms to be vectorized using SIMD operations.

Gather is a single instruction that loads data from non-adjacent indexes of an array into a Vector register.Scatter is a single instruction that stores data at non-adjacent indexes to an array from a Vector register.

Both instructions have a source/destination register parameter, a reference to an array parameter, and another vector parameter containing the indexes for each lane to be loaded or stored.

Find out more »


Diagnostics Allocations in ASP.NET Core

Object Allocations

ASP.NET Core's Kestrel is optimized for high performance and scale. To achieve high performance, it reduces heap allocations by either pooling large objects or by allocating structs on the stack. By reducing allocations, the GC has less work to do. As pooled objects get promoted to Gen2 generation, the most common collections (Gen0 and Gen1) become cheaper as they contain fewer objects to handle. However, pooling is not entirely free:

  • it increases the Gen2 size and its corresponding collections

  • cross-generation references may require tracking references from Gen2 regions pointing to lower generation regions (for example, when a pooled object contains a reference to a newly allocated object).

One example is type Http2Stream (or Http3Stream), which corresponds to a request-response pair in a connection. As such objects are large, they are typically pooled. These objects may have a reference to the corresponding HttpContext, which should be also pooled, otherwise it incurs an allocation or a cross-generation reference.

Find out more »