SIMD Sum 203/22/2026
In this post, I'll explore how to optimize a simple array summing operation using SIMD (Single Instruction Multiple Data) operations in C#. The example is inspired by Matt Godbolt's GOTO 2024 talk What Every Programmer Should Know about How CPUs Work - Matt Godbolt - GOTO 2024 about CPU architecture and branch prediction. We'll see how leveraging SIMD instructions can dramatically improve performance by reducing branch mispredictions and processing multiple elements in parallel.
A part of this talk describes the branch prediction feature of CPU. It uses a simple task for demonstration: a method is given a large set of random numbers, sums the total of the numbers and separately also sums the numbers below 128. The talk shows a sample implementation in Python and C++ and explains the reasons for the observed performance difference.
In this post I will implement this example in C# with a single difference: the set of input numbers are bytes and not ints.