On Tue, Aug 20, 2024 at 11:18 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Richard Sandiford <richard.sandif...@arm.com> writes:
> > Andrew Pinski <quic_apin...@quicinc.com> writes:
> >> When CSSC is not enabled, 128bit popcount can be implemented
> >> just via the vector (v16qi) cnt instruction followed by a reduction,
> >> like how the 64bit one is currently implemented instead of
> >> splitting into 2 64bit popcount.
> >>
> >> Build and tested for aarch64-linux-gnu.
> >>
> >>      PR target/113042
> >>
> >> gcc/ChangeLog:
> >>
> >>      * config/aarch64/aarch64.md (popcountti2): New define_expand.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>      * gcc.target/aarch64/popcnt10.c: New test.
> >>      * gcc.target/aarch64/popcnt9.c: New test.
> >
> > OK if there are no other comments in the next 24 hours.
>
> Sorry, only thought about it later, but:

Yes that is a good idea since that would be the same code in the end
anyways and it is slightly cleaner.
I was not 100% sure if you removed your approval or approved it with
the changes so I submitted a new patch here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660960.html

Thanks,
Andrew Pinski

>
> >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> >> index 12dcc16529a..73506e71f43 100644
> >> --- a/gcc/config/aarch64/aarch64.md
> >> +++ b/gcc/config/aarch64/aarch64.md
> >> @@ -5378,6 +5378,22 @@ (define_expand "popcount<mode>2"
> >>      }
> >>  })
> >>
> >> +(define_expand "popcountti2"
> >> +  [(set (match_operand:TI 0 "register_operand")
> >> +    (popcount:TI (match_operand:TI 1 "register_operand")))]
>
> Could you try making the output :DI instead of :TI?  I'd expect
> internal-fn.cc to handle that correctly and extend the result to
> 128 bits where needed.
>
> That would make the dummy popcount rtx malformed, so I suppose
> the pattern should just be:
>
>   [(match_operand:DI 0 "register_operand")
>    (match_operand:TI 1 "register_operand")]
>
> >> +  "TARGET_SIMD && !TARGET_CSSC"
> >> +{
> >> +  rtx v = gen_reg_rtx (V16QImode);
> >> +  rtx v1 = gen_reg_rtx (V16QImode);
> >> +  emit_move_insn (v, gen_lowpart (V16QImode, operands[1]));
> >> +  emit_insn (gen_popcountv16qi2 (v1, v));
> >> +  rtx out = gen_reg_rtx (DImode);
> >> +  emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (out, v1));
>
> We could then use operands[0] directly as the output here.
>
> Thanks,
> Richard
>
> >> +  out = convert_to_mode (TImode, out, true);
> >> +  emit_move_insn (operands[0], out);
> >> +  DONE;
> >> +})

Reply via email to