https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109498

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|14.0                        |
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
                 CC|                            |ktkachov at gcc dot gnu.org
   Last reconfirmed|                            |2024-07-31

--- Comment #2 from ktkachov at gcc dot gnu.org ---
This does get vectorised with SVE in GCC 14 but not optimally. It doesn't use
the recommended RBIT + CLZ but instead gives:
ctz:
        cmp     w2, 0
        ble     .L1
        mov     x3, 0
        whilelo p7.s, wzr, w2
        ptrue   p6.b, all
.L3:
        ld1w    z31.s, p7/z, [x1, x3, lsl 2]
        movprfx z30, z31
        neg     z30.s, p6/m, z31.s
        and     z30.d, z30.d, z31.d
        clz     z30.s, p6/m, z30.s
        subr    z30.s, z30.s, #31
        st1w    z30.s, p7, [x0, x3, lsl 2]
        incw    x3
        whilelo p7.s, w3, w2
        b.any   .L3
.L1:
        ret

with -Ofast -march=armv9-a --param aarch64-autovec-preference=2 .

Reply via email to