Wilco Dijkstra <wilco.dijks...@arm.com> writes: > Hi Richard, > >> Could you go into more detail about what the before and after code >> looks like, and what combine is doing? Like you say, this sounds >> like a target-independent thing on face value. > > It is indeed, but it seems specific to instructions where we have range > information which allows it to remove a redundant sign-extend. > > See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565 for the full details. > >> Either way, something like this needs a testcase. > > Sure I've added the testcase from pr93565, see below. > > Cheers, > Wilco > > > Although GCC should understand the limited range of clz/ctz/cls results, > Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary > sign extension. Avoid this by adding an explicit AND with 127 in the > patterns. Deepsjeng performance improves by ~0.6%. > > Bootstrap OK. > > ChangeLog: > 2020-02-04 Wilco Dijkstra <wdijk...@arm.com> > > PR rtl-optimization/93565 > * config/aarch64/aarch64.md (clz<mode>2): Mask the clz result. > (clrsb<mode>2): Likewise. > (ctz<mode>2): Likewise. > > * gcc.target/aarch64/pr93565.c: New test.
I think this only works because the (and ...) wrapper stops combine from matching a plain ctz: Failed to match this instruction: (set (reg:DI 100 [ _9+-4 ]) (ctz:DI (reg/v:DI 97 [ x ]))) I.e. the patch doesn't work by making the range explicit. It just works by wrapping the pattern in something that combine will never generate naturally. Something like (ashift:DI ... (const_int 0)) would give the same result. If I add a second pattern that is never generated directly but can be matched by combine: (define_insn_and_split "*ctz<mode>2" [(set (match_operand:GPI 0 "register_operand" "=r") (ctz:GPI (match_operand:GPI 1 "register_operand" "r")))] "" "#" "reload_completed" [(const_int 0)] " emit_insn (gen_rbit<mode>2 (operands[0], operands[1])); emit_insn (gen_clz<mode>2 (operands[0], operands[0])); DONE; ") then we get the double ctz again. So I don't think this is OK even as a workaround. We should fix it in target-independent code instead. Thanks, Richard