https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |12.1.0
--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Starting in GCC 12 we get on arm64 (with -Ofast):
```
mult_su3_na:
ldp q3, q1, [x1, 16]
ldr q0, [x0, 32]
ldp q2, q4, [x0]
fmul v0.2d, v0.2d, v1.2d
ldr q1, [x1]
fmla v0.2d, v4.2d, v3.2d
fmla v0.2d, v2.2d, v1.2d
faddp d0, v0.2d
ret
```
Which is better than before even. (similarly on x86_64 with -mfma) due to SLP
happening.
With -fno-tree-vectorize, -Ofast is slightly on x86_64 better than 13 by one
instruction.
I am not sure if this matters any more due to the SLP improvement ...