Wilco Dijkstra <wilco.dijks...@arm.com> writes:
> Hi Richard,
>
>> Could you go into more detail about what the before and after code
>> looks like, and what combine is doing?  Like you say, this sounds
>> like a target-independent thing on face value.
>
> It is indeed, but it seems specific to instructions where we have range
> information which allows it to remove a redundant sign-extend.
>
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565 for the full details.
>
>> Either way, something like this needs a testcase.
>
> Sure I've added the testcase from pr93565, see below.
>
> Cheers,
> Wilco
>
>
> Although GCC should understand the limited range of clz/ctz/cls results,
> Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary
> sign extension.  Avoid this by adding an explicit AND with 127 in the
> patterns. Deepsjeng performance improves by ~0.6%.
>
> Bootstrap OK.
>
> ChangeLog:
> 2020-02-04  Wilco Dijkstra  <wdijk...@arm.com>
>
> PR rtl-optimization/93565
> * config/aarch64/aarch64.md (clz<mode>2): Mask the clz result.
> (clrsb<mode>2): Likewise.
> (ctz<mode>2): Likewise.
>
> * gcc.target/aarch64/pr93565.c: New test.

I think this only works because the (and ...) wrapper stops combine
from matching a plain ctz:

Failed to match this instruction:
(set (reg:DI 100 [ _9+-4 ])
    (ctz:DI (reg/v:DI 97 [ x ])))

I.e. the patch doesn't work by making the range explicit.  It just
works by wrapping the pattern in something that combine will never
generate naturally.  Something like (ashift:DI ... (const_int 0))
would give the same result.

If I add a second pattern that is never generated directly but can be
matched by combine:

(define_insn_and_split "*ctz<mode>2"
 [(set (match_operand:GPI           0 "register_operand" "=r")
       (ctz:GPI (match_operand:GPI  1 "register_operand" "r")))]
  ""
  "#"
  "reload_completed"
  [(const_int 0)]
  "
  emit_insn (gen_rbit<mode>2 (operands[0], operands[1]));
  emit_insn (gen_clz<mode>2 (operands[0], operands[0]));
  DONE;
")

then we get the double ctz again.

So I don't think this is OK even as a workaround.  We should fix it in
target-independent code instead.

Thanks,
Richard

Reply via email to