Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization

Robin Dapp via Gcc-patches Mon, 31 Jul 2023 12:38:35 -0700

Hi Juzhe,

> +/* Expand Vector POPCOUNT by parallel popcnt:
> +
> +   int parallel_popcnt(uint32_t n) {
> +   #define POW2(c)      (1U << (c))
> +   #define MASK(c)      (static_cast<uint32_t>(-1) / (POW2(POW2(c)) + 1U))
> +   #define COUNT(x, c)  ((x) & MASK(c)) + (((x)>>(POW2(c))) & MASK(c))
> +     n = COUNT(n, 0);
> +     n = COUNT(n, 1);
> +     n = COUNT(n, 2);
> +     n = COUNT(n, 3);
> +     n = COUNT(n, 4);
> +   //        n = COUNT(n, 5);  // uncomment this line for 64-bit integers
> +     return n;
> +   #undef COUNT
> +   #undef MASK
> +   #undef POW2
> +   }


That's quite a heavy implementation but I suppose with the proper cost
function it can still be worth it.  Did you also try some alternatives?
WWG comes to mind:

uint64_t c1 = 0x5555555555555555;
uint64_t c2 = 0x3333333333333333;
uint64_t c4 = 0x0F0F0F0F0F0F0F0F;

uint64_t wwg (uint64_t x) {
    x -= (x >> 1) & c1;
    x = ((x >> 2) & c2) + (x & c2);
    x = (x + (x >> 4) ) & c4;
    x *= 0x0101010101010101;
    return x >> 56;
}

>From my recollection this is usually 30-40% faster than the naive tree
adder and also amenable to vectorization.  As long as the multiplication
is not terribly slow, that is.  Mula's algorithm should be significantly
faster even, another 30% IIRC.

I'm not against continuing with the more well-known approach for now
but we should keep in mind that might still be potential for improvement.

>  } // namespace riscv_vector
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c

Any particular reason why the tests are in widen?

> +extern void abort (void) __attribute__ ((noreturn));

Why no __builtin_unreachable as in the other tests? 

> +      asm volatile ("" ::: "memory");

Is this necessary?  I doesn't hurt of course, just wondering.

All in all LGTM in case you'd rather get this upstream now.  We can
always improve later.

Regards
 Robin

Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization

Reply via email to