https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |12.1.0

--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Starting in GCC 12 we get on arm64 (with -Ofast):
```
mult_su3_na:
        ldp     q3, q1, [x1, 16]
        ldr     q0, [x0, 32]
        ldp     q2, q4, [x0]
        fmul    v0.2d, v0.2d, v1.2d
        ldr     q1, [x1]
        fmla    v0.2d, v4.2d, v3.2d
        fmla    v0.2d, v2.2d, v1.2d
        faddp   d0, v0.2d
        ret
```

Which is better than before even. (similarly on x86_64 with -mfma) due to SLP
happening.

With -fno-tree-vectorize, -Ofast is slightly on x86_64 better than 13 by one
instruction.

I am not sure if this matters any more due to the SLP improvement ...

Reply via email to