> Pengxuan Zheng <quic_pzh...@quicinc.com> writes:
> > This is similar to the recent improvements to the Advanced SIMD
> > popcount expansion by using SVE. We can utilize SVE to generate more
> > efficient code for scalar mode popcount too.
> >
> > Changes since v1:
> > * v2: Add a new VNx1BI mode and a new test case for V1DI.
> > * v3: Abandon VNx1BI changes and add a new variant of
> aarch64_ptrue_reg.
> 
> Sorry for the slow review.
> 
> The patch looks good though.  OK with the changes below:
> 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> > b/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> > new file mode 100644
> > index 00000000000..f086cae55a2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fgimple" } */
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> 
> It's probably safer to add:
> 
> #pragma GCC target "+nosve"
> 
> here, so that we don't try to use the SVE instructions.
> 
> > +/*
> > +** foo:
> > +** cnt     v0.8b, v0.8b
> > +** addv    b0, v0.8b
> 
> Nothing requires the temporary register to be v0, so this should be something
> like:
> 
>       cnt     (v[0-9]+\.8b), v0\.8b
>       addv    b0, \1

Good point! I've updated the testcase and pushed the patch as 
r15-4579-g9ffcf1f193b47.

Thanks,
Pengxuan
> 
> Thanks,
> Richard
> 
> > +** ret
> > +*/
> > +__Uint64x1_t __GIMPLE
> > +foo (__Uint64x1_t x)
> > +{
> > +  __Uint64x1_t z;
> > +
> > +  z = .POPCOUNT (x);
> > +  return z;
> > +}

Reply via email to