https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-04-12
            Summary|Unoptimal jump threading    |Unoptimal uncprop with
                   |with assembler flag output  |assembler flag output
     Ever confirmed|0                           |1
          Component|middle-end                  |tree-optimization

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We expand from

  __asm__ __volatile__("int3" : "=@ccz" success_4);
  if (success_4 != 0)
    goto <bb 4>; [66.00%]
  else
    goto <bb 5>; [34.00%]
;;    succ:       5 
;;                4

;;   basic block 4, loop depth 0
;;    pred:       3
;;                2
  __asm__ __volatile__("" :  :  : "memory");
;;    succ:       5

;;   basic block 5, loop depth 0
;;    pred:       3
;;                4
  # _1 = PHI <success_4(3), 1(4)>
  return _1;

and it's not PHI-opt "getting in the way" but instead RTL expansion
placing the edge 3->4 copy of 'success_4' before the conditional branch
rather than to a new BB.  I suppose if we'd split critical edges that
would fix it (at the expense of some extra blocks and unconditional
jumps).

Note that clang seems to propagate the constant equivalence which we
instead un-propagate.  With -fdisable-tree-uncprop1 you'll get the
expected code:

foo:
.LFB0:
        .cfi_startproc
        cmpl    $-1, %edi
        je      .L8
.L2:
        movl    $1, %eax
        ret
        .p2align 4,,10
        .p2align 3
.L8:
        xorl    %eax, %eax
#APP
# 6 "t.c" 1
        int3
# 0 "" 2
#NO_APP
        je      .L2
        ret

what uncprop doesn't understand is that copying success requires to
materialize it (it's just in CC), that's the reason it prefers that
over a zero (because zero also needs materializing).

And the RTL pipeline is not good enough in scheduling/sinking a
CC consumer across another CC consumer it seems (or even realizing
the result is constant on the only needed edge).

It might be possible to just special-case (bool) ASM defs in uncprop,
but that would be a heuristic.  Not sure if we can portably identify
CC mode constraints.

Reply via email to