https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902
--- Comment #32 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Rocco Tormenta from comment #31) > Hello, I have another basic example. I encountered this issue today while > trying to calculate the squared n-dimensional Euclidean distance between two > points. I apologize if this is not the same issue, though I think it is at > least related given that it broke on the same version. > > https://gcc.godbolt.org/z/1cTcazh3Y > > That workspace includes proof of concept code, as well as examples of known > workarounds. > > The relevant function is: > > float nd_sq_euclid(float *a, float *b, int n) { > float dist = 0.0; > for (int i = 0; i < n; i++) { > float d1 = a[i] - b[i]; > dist += d1 * d1; > } > return dist; > } > > From the generated assembly alone, you can see that some of the code paths > (namely, for n >= 3) do not use FMA. > > I included some values (a, b, c) that have different results for FMA and > non-FMA operations, so it is easier to see the difference. As you can see, > by padding the input with zeroes and increasing n, it works fine for n=1 and > n=2, but breaks starting from n=3. The easy work around is to use __builtin_assoc_barrier like say: ``` float nd_sq_euclid(float *a, float *b, int n) { float dist = 0.0; for (int i = 0; i < n; i++) { float d1 = a[i] - b[i]; dist += __builtin_assoc_barrier (d1 * d1); } return dist; } ``` Note this has always worked to avoid FMA formation since __builtin_assoc_barrier was added but is only been documented recently. See https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fassoc_005fbarrier .