> Pengxuan Zheng <quic_pzh...@quicinc.com> writes: > > This is similar to the recent improvements to the Advanced SIMD > > popcount expansion by using SVE. We can utilize SVE to generate more > > efficient code for scalar mode popcount too. > > > > Changes since v1: > > * v2: Add a new VNx1BI mode and a new test case for V1DI. > > * v3: Abandon VNx1BI changes and add a new variant of > aarch64_ptrue_reg. > > Sorry for the slow review. > > The patch looks good though. OK with the changes below: > > > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt12.c > > b/gcc/testsuite/gcc.target/aarch64/popcnt12.c > > new file mode 100644 > > index 00000000000..f086cae55a2 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt12.c > > @@ -0,0 +1,18 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fgimple" } */ > > +/* { dg-final { check-function-bodies "**" "" "" } } */ > > + > > It's probably safer to add: > > #pragma GCC target "+nosve" > > here, so that we don't try to use the SVE instructions. > > > +/* > > +** foo: > > +** cnt v0.8b, v0.8b > > +** addv b0, v0.8b > > Nothing requires the temporary register to be v0, so this should be something > like: > > cnt (v[0-9]+\.8b), v0\.8b > addv b0, \1
Good point! I've updated the testcase and pushed the patch as r15-4579-g9ffcf1f193b47. Thanks, Pengxuan > > Thanks, > Richard > > > +** ret > > +*/ > > +__Uint64x1_t __GIMPLE > > +foo (__Uint64x1_t x) > > +{ > > + __Uint64x1_t z; > > + > > + z = .POPCOUNT (x); > > + return z; > > +}