On Tue, Aug 20, 2024 at 11:18 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Richard Sandiford <richard.sandif...@arm.com> writes: > > Andrew Pinski <quic_apin...@quicinc.com> writes: > >> When CSSC is not enabled, 128bit popcount can be implemented > >> just via the vector (v16qi) cnt instruction followed by a reduction, > >> like how the 64bit one is currently implemented instead of > >> splitting into 2 64bit popcount. > >> > >> Build and tested for aarch64-linux-gnu. > >> > >> PR target/113042 > >> > >> gcc/ChangeLog: > >> > >> * config/aarch64/aarch64.md (popcountti2): New define_expand. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.target/aarch64/popcnt10.c: New test. > >> * gcc.target/aarch64/popcnt9.c: New test. > > > > OK if there are no other comments in the next 24 hours. > > Sorry, only thought about it later, but:
Yes that is a good idea since that would be the same code in the end anyways and it is slightly cleaner. I was not 100% sure if you removed your approval or approved it with the changes so I submitted a new patch here: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660960.html Thanks, Andrew Pinski > > >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > >> index 12dcc16529a..73506e71f43 100644 > >> --- a/gcc/config/aarch64/aarch64.md > >> +++ b/gcc/config/aarch64/aarch64.md > >> @@ -5378,6 +5378,22 @@ (define_expand "popcount<mode>2" > >> } > >> }) > >> > >> +(define_expand "popcountti2" > >> + [(set (match_operand:TI 0 "register_operand") > >> + (popcount:TI (match_operand:TI 1 "register_operand")))] > > Could you try making the output :DI instead of :TI? I'd expect > internal-fn.cc to handle that correctly and extend the result to > 128 bits where needed. > > That would make the dummy popcount rtx malformed, so I suppose > the pattern should just be: > > [(match_operand:DI 0 "register_operand") > (match_operand:TI 1 "register_operand")] > > >> + "TARGET_SIMD && !TARGET_CSSC" > >> +{ > >> + rtx v = gen_reg_rtx (V16QImode); > >> + rtx v1 = gen_reg_rtx (V16QImode); > >> + emit_move_insn (v, gen_lowpart (V16QImode, operands[1])); > >> + emit_insn (gen_popcountv16qi2 (v1, v)); > >> + rtx out = gen_reg_rtx (DImode); > >> + emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (out, v1)); > > We could then use operands[0] directly as the output here. > > Thanks, > Richard > > >> + out = convert_to_mode (TImode, out, true); > >> + emit_move_insn (operands[0], out); > >> + DONE; > >> +})