https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2023-04-12 Summary|Unoptimal jump threading |Unoptimal uncprop with |with assembler flag output |assembler flag output Ever confirmed|0 |1 Component|middle-end |tree-optimization --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- We expand from __asm__ __volatile__("int3" : "=@ccz" success_4); if (success_4 != 0) goto <bb 4>; [66.00%] else goto <bb 5>; [34.00%] ;; succ: 5 ;; 4 ;; basic block 4, loop depth 0 ;; pred: 3 ;; 2 __asm__ __volatile__("" : : : "memory"); ;; succ: 5 ;; basic block 5, loop depth 0 ;; pred: 3 ;; 4 # _1 = PHI <success_4(3), 1(4)> return _1; and it's not PHI-opt "getting in the way" but instead RTL expansion placing the edge 3->4 copy of 'success_4' before the conditional branch rather than to a new BB. I suppose if we'd split critical edges that would fix it (at the expense of some extra blocks and unconditional jumps). Note that clang seems to propagate the constant equivalence which we instead un-propagate. With -fdisable-tree-uncprop1 you'll get the expected code: foo: .LFB0: .cfi_startproc cmpl $-1, %edi je .L8 .L2: movl $1, %eax ret .p2align 4,,10 .p2align 3 .L8: xorl %eax, %eax #APP # 6 "t.c" 1 int3 # 0 "" 2 #NO_APP je .L2 ret what uncprop doesn't understand is that copying success requires to materialize it (it's just in CC), that's the reason it prefers that over a zero (because zero also needs materializing). And the RTL pipeline is not good enough in scheduling/sinking a CC consumer across another CC consumer it seems (or even realizing the result is constant on the only needed edge). It might be possible to just special-case (bool) ASM defs in uncprop, but that would be a heuristic. Not sure if we can portably identify CC mode constraints.