[Bug target/114107] poor vectorization at -O3 when dealing with arrays of different multiplicity, good with -O2

pinskia at gcc dot gnu.org via Gcc-bugs Sun, 25 Feb 2024 15:57:05 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107


Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-linux

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I am not 100% sure that is always better.

What is happening is GCC is vectorizing even the outer loop.

It is easier to understand via aarch64 asm too:
.L4:
        ldr     q27, [x3], 16
        ld4     {v28.2d - v31.2d}, [x4]
        fmul    v24.2d, v27.2d, v28.2d
        fmul    v25.2d, v27.2d, v29.2d
        fmul    v26.2d, v27.2d, v30.2d
        fmul    v27.2d, v27.2d, v31.2d
        st4     {v24.2d - v27.2d}, [x4], 64
        cmp     x3, x5
        bne     .L4

Have you benchmarked both?

If anything this is a cost model issue.

[Bug target/114107] poor vectorization at -O3 when dealing with arrays of different multiplicity, good with -O2

Reply via email to