https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78200

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
OTOH we _do_ have initial RTL

(insn 167 166 168 20 (set (reg:CCGOC 17 flags)
        (compare:CCGOC (reg/v:DI 217 [ red_cost ])
            (const_int 0 [0]))) "pbeampp.c":42 -1
     (nil))
(jump_insn 168 167 169 20 (set (pc)
        (if_then_else (ge (reg:CCGOC 17 flags)
                (const_int 0 [0]))
            (label_ref 175)
            (pc))) "pbeampp.c":42 -1
     (int_list:REG_BR_PROB 6400 (nil))
 -> 175)
;;  succ:       21 [36.0%]  (FALLTHRU)
;;              23 [64.0%]

;; basic block 23, loop depth 2, count 0, freq 1067, maybe hot
;; Invalid sum of incoming frequencies 1216, should be 1067
;;  prev block 22, next block 24, flags: (NEW, REACHABLE, RTL, MODIFIED,
VISITED)
;;  pred:       20 [64.0%]
(code_label 175 173 176 23 98 "" [1 uses])
(note 176 175 177 23 [bb 23] NOTE_INSN_BASIC_BLOCK)
(insn 177 176 178 23 (set (reg:CCNO 17 flags)
        (compare:CCNO (reg/v:DI 217 [ red_cost ])
            (const_int 0 [0]))) "pbeampp.c":42 -1
     (nil))
(insn 178 177 179 23 (set (reg:QI 273)
        (gt:QI (reg:CCNO 17 flags)
            (const_int 0 [0]))) "pbeampp.c":42 -1
     (nil))
(insn 179 178 180 23 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:QI 273)
            (const_int 0 [0]))) "pbeampp.c":42 -1
     (nil))
(jump_insn 180 179 587 23 (set (pc)
        (if_then_else (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (label_ref 196)
            (pc))) "pbeampp.c":42 -1
     (int_list:REG_BR_PROB 3300 (nil))
 -> 196)

that is, it compares in a sensible order allowing for combining (which
appearantly is what causes the code to run slower for not yet explored
reasons).

Expanding the other way around does not have any justification IMHO
and thus the "fix" would be to the later stage where we combine
the compare with the one on the backedge.

The issue is CSE2 which does

(insn 167 166 168 21 (set (reg:CC 17 flags)
        (compare:CC (reg/v:DI 217 [ red_cost ])
            (const_int 0 [0]))) "pbeampp.c":42 8 {*cmpdi_1}
     (nil))
(jump_insn 168 167 169 21 (set (pc)
        (if_then_else (ge (reg:CC 17 flags)
                (const_int 0 [0]))
            (label_ref 175)
            (pc))) "pbeampp.c":42 635 {*jcc_1}
     (expr_list:REG_DEAD (reg:CC 17 flags)
        (int_list:REG_BR_PROB 6400 (nil)))
 -> 175)
...
(insn 178 176 179 24 (set (reg:QI 273)
        (gt:QI (reg:CC 17 flags)
            (const_int 0 [0]))) "pbeampp.c":42 631 {*setcc_qi}
     (expr_list:REG_DEAD (reg:CC 17 flags)
        (nil)))

thus changes the earlier compare to CC and re-uses that CCmode.  Note it's
still a mystery to me why this is slower (and I did not reproduce that myself
yet).

Then we combine it to

(insn 167 166 168 18 (set (reg:CC 17 flags)
        (compare:CC (reg/v:DI 217 [ red_cost ])
            (const_int 0 [0]))) "pbeampp.c":42 8 {*cmpdi_1}
     (nil))
(jump_insn 168 167 169 18 (set (pc)
        (if_then_else (ge (reg:CC 17 flags)
                (const_int 0 [0]))
            (label_ref 175)
            (pc))) "pbeampp.c":42 635 {*jcc_1}
     (int_list:REG_BR_PROB 6400 (nil))
 -> 175)
;;  succ:       19 [36.0%]  (FALLTHRU)
;;              20 [64.0%]


;; basic block 20, loop depth 0, count 0, freq 1067, maybe hot
;; Invalid sum of incoming frequencies 1216, should be 1067
(jump_insn 180 179 587 20 (set (pc)
        (if_then_else (le (reg:CC 17 flags)
                (const_int 0 [0]))
            (label_ref:DI 196)
            (pc))) "pbeampp.c":42 635 {*jcc_1}
     (int_list:REG_BR_PROB 3300 (expr_list:REG_DEAD (reg:CCZ 17 flags)
            (nil)))

Reply via email to