Floating away
Every computer science student is told that floating-point arithmetic is fraught. There be dragons. Stack Overflow and other Q&A sites have a mountain of “is floating-point math broken?” questions. But I was today years old, approximately 51.07 (floating-point) years, before I ran into a situation where I actually met the dragons.
As a general rule in mathematics (for sensible algebraic structures),
a + b + c
is the same as c + a + b
. Not so in floating-point math.
At work-work we have a codebase that goes back 30 years. Parts of it are C++23 and cutting edge, and parts of it are .. not so much. Sometimes my job is figuring out what an undocumented function does, so we can add documentation and unit-tests and so nudge it towards modernity.
Let me take a moment to recommend Clare Macrae as a source of good talks on testing – tips, techniques, and the process of acceptance testing. She’s been an inspiration for the things I do these days.
My general approach to this kind of problem is “read the code, try to understand it, then write a reimplementation of it in modern terms.” The implementation and reimplementation live side-by side. The new code has documentation and, of course, unit-tests that express what it does.
But the litmus test of the new code is not what it does, but does-it-do-what-the-old-code-does. If it isn’t a faithful, and possibly bug-for-bug-compatible, reimplementation, then it’s not all that useful. The equivalence testing is done with a bunch more unit-tests and a lot of random data.
For a reimplementation of a function that sums a bunch of float
s
and does an awful lot of memory-thrashing, I figured out it
was doing a sliding-window over a buffer of floats.
I wrote a little class that implements the sliding-window
part. And it did the obvious thing
and my silly tests like “compute the average of integers 1..100”
worked.
Naturally, it failed the litmus tests. Given a buffer filled with random
data, it would always fail to compute the same values as the existing code,
although the differences were in the 6th decimal digit.
Much fmt::print()
later, I boiled it down to differences in
floating-point summation order. Here’s a tiny demo program that
shows the problem as well:
#include <fmt/core.h>
int main() {
float inputs[4] = { 10.17f, -11.01f, 3.114677f, -3.109f };
float sum0 = 0.0f;
for (int i=0; i<4; ++i) sum0+=inputs[i];
float sum1 = 0.0f;
for (int i=3; 0<=i; --i) sum1+=inputs[i];
fmt::print("sum0={}\nsum1={}\n", sum0, sum1);
return 0;
}
On my amd64 machine with clang 15, summing these four numbers one way yields -0.83432317 and in another way, -0.8343229. And with optimization enabled, other values.
So I had to dig into my reimplementation and change the “obvious” summation order to match the order in which values were added together in the original implementation. Floating-point rounding errors pile up, and in my mind the image of Dr. Ben Polman, who taught me numerical computing in 1997 or so appeared with a disappointed tut-tut frown on his face.
Floating-point. It is fraught. There are dragons. Personally, I think I prefer Konqi.