Laszlo

Hello, I am Laszlo

Software-Engineer, .NET developer

Contact Me

Measuring Lock Contention

Scaling multi-threaded applications is often limited by the proportion of the workload that cannot be parallelized. This concept is captured by Amdahl's law. Code sections guarded by locks are a common example — they restrict scalability by allowing only one thread (or n threads in the case of semaphores) to execute, forcing other threads to wait due to lock contention.

Context

In an HTTP/2 server, requests and responses on a connection are multiplexed. The server needs to write concurrently computed responses to a network stream. Such multiplexing can be achieved in multiple ways:

Each solution has its own performance characteristics. To better understand these characteristics, I investigated how and when locks cause contention. This blog post details two different approaches to measure lock contention.

Test Application

For the measurement, I prepared a web application with a custom server using locks to synchronize HTTP/2 response writes to the TCP connection.

I used CHttp to send 20,000 requests on a single connection:

###
# @clientsCount 20
# @requestCount 20000
# @shared-socketHandler true
GET https://{{host}}/direct HTTP/2

The clients count is set to 20, ensuring that 20 threads send concurrent requests using a single HTTP/2 connection.

Lock Contention Using Visual Studio

  1. Open Visual Studio's Performance Profiler via Debug -> Performance Profiler...
  2. In the Event Viewer, select .NET Runtime -> Contention
  3. Set Event level to Always or Verbose
  4. Run the web application and send the requests
  5. After stopping the diagnostic session, Visual Studio will display all captured events
  6. Filter the events by selecting Contention/LockCreated, Contention/Start, and Contention/End events.

When selecting a Start event, you'll see a stack trace (if selected during profiling), a lock ID, and additional metadata. The lock ID and stack trace are particularly useful for identifying contention points.

Sample stack trace:

system.private.corelib.dll!0x00007ffafd8361e0
system.private.corelib.dll!0x00007ffafd7070ee
System.Threading.Lock.EnterAndGetCurrentThreadId()
System.Threading.Lock.EnterScope()
CHttpServer.SyncHttp2ResponseWriter.ScheduleWriteHeaders(CHttpServer.Http2Stream)
CHttpServer.Http2Stream.StartAsync()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start<T>(T)
CHttpServer.Http2Stream.StartAsync(System.Threading.CancellationToken)
CHttpServer.Http2Stream.FlushResponseBodyAsync()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start<T>(T)
CHttpServer.Http2Stream.FlushResponseBodyAsync(uint, System.Threading.CancellationToken)
CHttpServer.PostFlushHttp2StreamPipeWriter.FlushAsync()

Toggle 'Show Just My Code' button to expand all managed layers of the stack.

The Stop event includes both a stack trace and duration in nanoseconds. While this data is easily understood, Visual Studio could benefit from better aggregation views (e.g., total lock duration per lock) or grouping locks based on their source code location.

To explore details further, select a single event and choose 'Show Stacks For Event' to see how many times the event was triggered on different stacks in the application.

Lock Contention Using PerfView

PerfView offers an open-source alternative to Visual Studio. Setting up a capture session requires more preparation. In the Collect -> Collect context window, add this provider to capture contention events with full stacks:

Microsoft-Windows-DotNETRuntime:ContentionKeyword:Always:@StacksEnabled=true

When analyzing data on the same machine, uncheck both the Zip and Merge checkboxes.

To capture and analyze data:

  1. Start Collection
  2. Run the web application and send the requests
  3. Stop Collection
  4. Open the collected etl trace file and navigate to Events

This view shows all applications, so you'll need to filter events using both the Process Filter and Event Types. While PerfView shows the same data as Visual Studio, it presents it in a more raw format.

To analyze stacks:

  • Use the context menu to select Open Any Stacks
  • Select the Contention events
  • Double-click to view the Callers tab
  • Expand stack frames to explore captured stacks

PerfView allows exporting to Excel for custom analysis and aggregation of the events.

Conclusion

Lock contention is a critical performance consideration in multi-threaded applications that can significantly impact scalability. It occurs in two main scenarios:

  • When multiple threads frequently compete for the same lock
  • When threads hold locks for extended periods, causing other threads to starve

This blog post demonstrated two practical approaches to measuring and analyzing lock contention:

  1. Visual Studio Performance Profiler
  2. PerfView

Understanding lock contention patterns through these tools can help developers:

  • Identify bottlenecks in concurrent code
  • Make informed decisions about synchronization strategies
  • Choose between different concurrency patterns (Channels, locks, or custom scheduling)
  • Optimize critical sections for better scalability

For HTTP/2 server implementations specifically, this analysis can guide the choice between different multiplexing strategies, ensuring optimal performance under high concurrent loads.