Pengxuan Zheng <[email protected]> writes:
> This is similar to the recent improvements to the Advanced SIMD popcount
> expansion by using SVE. We can utilize SVE to generate more efficient code for
> scalar mode popcount too.
>
> Changes since v1:
> * v2: Add a new VNx1BI mode and a new test case for V1DI.
> * v3: Abandon VNx1BI changes and add a new variant of aarch64_ptrue_reg.
Sorry for the slow review.
The patch looks good though. OK with the changes below:
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> b/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> new file mode 100644
> index 00000000000..f086cae55a2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fgimple" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
It's probably safer to add:
#pragma GCC target "+nosve"
here, so that we don't try to use the SVE instructions.
> +/*
> +** foo:
> +** cnt v0.8b, v0.8b
> +** addv b0, v0.8b
Nothing requires the temporary register to be v0, so this should be
something like:
cnt (v[0-9]+\.8b), v0\.8b
addv b0, \1
Thanks,
Richard
> +** ret
> +*/
> +__Uint64x1_t __GIMPLE
> +foo (__Uint64x1_t x)
> +{
> + __Uint64x1_t z;
> +
> + z = .POPCOUNT (x);
> + return z;
> +}