https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121952

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
LLVM produces this for the loop (with -O2 -fno-vectorize):
.LBB0_2:
        and     w12, w9, #0x1
        subs    x11, x11, #1
        sub     x9, x9, #1
        strb    w12, [x10], #1
        b.ne    .LBB0_2


But really it should just:
.LBB0_2:
        subs    x11, x11, #1
        strb    w9, [x10], #1 ;; post increment
        xor     w9, w9, #1
        b.ne    .LBB0_2

GCC produces this:
.L3:
        add     w1, w3, w0
        and     w1, w1, 1
        strb    w1, [x0, 1]! // pre increment
        cmp     x0, x2
        bne     .L3

But that is because the IV is the pointer and adding and offset w3.

Also LLVM vectorization is worse:
```
.LBB0_6:
        sub     v20.2d, v0.2d, v18.2d
        sub     v24.2d, v0.2d, v4.2d
        subs    x12, x12, #16
        sub     v21.2d, v0.2d, v7.2d
        sub     v25.2d, v0.2d, v3.2d
        add     v7.2d, v7.2d, v17.2d
        sub     v22.2d, v0.2d, v6.2d
        sub     v26.2d, v0.2d, v2.2d
        add     v6.2d, v6.2d, v17.2d
        sub     v23.2d, v0.2d, v5.2d
        sub     v27.2d, v0.2d, v1.2d
        add     v18.2d, v18.2d, v17.2d
        add     v5.2d, v5.2d, v17.2d
        add     v4.2d, v4.2d, v17.2d
        add     v3.2d, v3.2d, v17.2d
        add     v2.2d, v2.2d, v17.2d
        add     v1.2d, v1.2d, v17.2d
        tbl     v20.16b, { v20.16b, v21.16b, v22.16b, v23.16b }, v19.16b
        tbl     v21.16b, { v24.16b, v25.16b, v26.16b, v27.16b }, v19.16b
        mov     v20.d[1], v21.d[0]
        and     v20.16b, v20.16b, v16.16b
        str     q20, [x11], #16
```

99% due to add/sub still being inside the loop and not figuring out this is
just `^1` for the induction variable in the end.

Reply via email to