https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #32 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Rocco Tormenta from comment #31)
> Hello, I have another basic example. I encountered this issue today while
> trying to calculate the squared n-dimensional Euclidean distance between two
> points. I apologize if this is not the same issue, though I think it is at
> least related given that it broke on the same version.
> 
> https://gcc.godbolt.org/z/1cTcazh3Y
> 
> That workspace includes proof of concept code, as well as examples of known
> workarounds.
> 
> The relevant function is:
> 
> float nd_sq_euclid(float *a, float *b, int n)  {
>     float dist = 0.0;
>     for (int i = 0; i < n; i++) {
>         float d1 = a[i] - b[i];
>         dist += d1 * d1;
>     }
>     return dist;
> }
> 
> From the generated assembly alone, you can see that some of the code paths
> (namely, for n >= 3) do not use FMA.
> 
> I included some values (a, b, c) that have different results for FMA and
> non-FMA operations, so it is easier to see the difference. As you can see,
> by padding the input with zeroes and increasing n, it works fine for n=1 and
> n=2, but breaks starting from n=3.

The easy work around is to use __builtin_assoc_barrier like say:
```
float nd_sq_euclid(float *a, float *b, int n)  {
    float dist = 0.0;
    for (int i = 0; i < n; i++) {
        float d1 = a[i] - b[i];
        dist += __builtin_assoc_barrier (d1 * d1);
    }
    return dist;
}
```

Note this has always worked to avoid FMA formation since
__builtin_assoc_barrier  was added but is only been documented recently.
See
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fassoc_005fbarrier
.

Reply via email to