Floating away

Every computer science student is told that floating-point arithmetic is fraught. There be dragons. Stack Overflow and other Q&A sites have a mountain of “is floating-point math broken?” questions. But I was today years old, approximately 51.07 (floating-point) years, before I ran into a situation where I actually met the dragons.

As a general rule in mathematics (for sensible algebraic structures), a + b + c is the same as c + a + b. Not so in floating-point math.

At work-work we have a codebase that goes back 30 years. Parts of it are C++23 and cutting edge, and parts of it are .. not so much. Sometimes my job is figuring out what an undocumented function does, so we can add documentation and unit-tests and so nudge it towards modernity.

Let me take a moment to recommend Clare Macrae as a source of good talks on testing – tips, techniques, and the process of acceptance testing. She’s been an inspiration for the things I do these days.

My general approach to this kind of problem is “read the code, try to understand it, then write a reimplementation of it in modern terms.” The implementation and reimplementation live side-by side. The new code has documentation and, of course, unit-tests that express what it does.

But the litmus test of the new code is not what it does, but does-it-do-what-the-old-code-does. If it isn’t a faithful, and possibly bug-for-bug-compatible, reimplementation, then it’s not all that useful. The equivalence testing is done with a bunch more unit-tests and a lot of random data.

For a reimplementation of a function that sums a bunch of floats and does an awful lot of memory-thrashing, I figured out it was doing a sliding-window over a buffer of floats. I wrote a little class that implements the sliding-window part. And it did the obvious thing and my silly tests like “compute the average of integers 1..100” worked.

Naturally, it failed the litmus tests. Given a buffer filled with random data, it would always fail to compute the same values as the existing code, although the differences were in the 6th decimal digit. Much fmt::print() later, I boiled it down to differences in floating-point summation order. Here’s a tiny demo program that shows the problem as well:

#include <fmt/core.h>

int main() {
    float inputs[4] = { 10.17f, -11.01f, 3.114677f, -3.109f };

    float sum0 = 0.0f;
    for (int i=0; i<4; ++i) sum0+=inputs[i];

    float sum1 = 0.0f;
    for (int i=3; 0<=i; --i) sum1+=inputs[i];

    fmt::print("sum0={}\nsum1={}\n", sum0, sum1);
    return 0;
}

On my amd64 machine with clang 15, summing these four numbers one way yields -0.83432317 and in another way, -0.8343229. And with optimization enabled, other values.

So I had to dig into my reimplementation and change the “obvious” summation order to match the order in which values were added together in the original implementation. Floating-point rounding errors pile up, and in my mind the image of Dr. Ben Polman, who taught me numerical computing in 1997 or so appeared with a disappointed tut-tut frown on his face.

Floating-point. It is fraught. There are dragons. Personally, I think I prefer Konqi.