https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
SI (and DI) can be optimized too.

LLVM is produces for int:
        ldr     d0, [x0]
        cnt     v0.8b, v0.8b
        uaddlp  v0.4h, v0.8b
        uaddlp  v0.2s, v0.4h
        str     d0, [x1]
        ret

And for long:
```
        ldr     q0, [x0]
        cnt     v0.16b, v0.16b
        uaddlp  v0.8h, v0.16b
        uaddlp  v0.4s, v0.8h
        uaddlp  v0.2d, v0.4s
        str     q0, [x1]
        ret
```

That is for SLP version:
```
void f(unsigned long *  __restrict b, unsigned long * __restrict d)
{
    d[0]  = __builtin_popcountll(b[0]);
    d[1]  = __builtin_popcountll(b[1]);
}
```
s/long/int/ in the first case.

Note using SVE is better than the above if it is available and that is part of
PR 113860 though.

Reply via email to