https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78200
--- Comment #11 from Venkataramanan <venkataramanan.kumar at amd dot com> ---
Hi Richard
On haswell machine original run time for -O3 -max2 -mprefer-avx2
real 2m35.325s
user 2m35.257s
sys 0m0.070s
Changing the assembly from
.L98:
jle .L97
cmpl $2, %r9d
jne .L97
.L99:
To
.L98:
cmpl $2, %r9d
jne .L97
cmpq $0, %rdi
jle .L97
.L99:
real 2m27.224s
user 2m27.138s
sys 0m0.087s
improves run time.
> -----Original Message-----
> From: rguenth at gcc dot gnu.org [mailto:[email protected]]
> Sent: Wednesday, November 9, 2016 6:02 PM
> To: Kumar, Venkataramanan <[email protected]>
> Subject: [Bug rtl-optimization/78200] [7 Regression] 429.mcf of cpu2006
> regresses in GCC trunk for avx2 target.
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78200
>
> --- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- OTOH we
> _do_ have initial RTL
>
> (insn 167 166 168 20 (set (reg:CCGOC 17 flags)
> (compare:CCGOC (reg/v:DI 217 [ red_cost ])
> (const_int 0 [0]))) "pbeampp.c":42 -1
> (nil))
> (jump_insn 168 167 169 20 (set (pc)
> (if_then_else (ge (reg:CCGOC 17 flags)
> (const_int 0 [0]))
> (label_ref 175)
> (pc))) "pbeampp.c":42 -1
> (int_list:REG_BR_PROB 6400 (nil))
> -> 175)
> ;; succ: 21 [36.0%] (FALLTHRU)
> ;; 23 [64.0%]
>
> ;; basic block 23, loop depth 2, count 0, freq 1067, maybe hot ;; Invalid sum
> of
> incoming frequencies 1216, should be 1067 ;; prev block 22, next block 24,
> flags: (NEW, REACHABLE, RTL, MODIFIED,
> VISITED)
> ;; pred: 20 [64.0%]
> (code_label 175 173 176 23 98 "" [1 uses]) (note 176 175 177 23 [bb 23]
> NOTE_INSN_BASIC_BLOCK) (insn 177 176 178 23 (set (reg:CCNO 17 flags)
> (compare:CCNO (reg/v:DI 217 [ red_cost ])
> (const_int 0 [0]))) "pbeampp.c":42 -1
> (nil))
> (insn 178 177 179 23 (set (reg:QI 273)
> (gt:QI (reg:CCNO 17 flags)
> (const_int 0 [0]))) "pbeampp.c":42 -1
> (nil))
> (insn 179 178 180 23 (set (reg:CCZ 17 flags)
> (compare:CCZ (reg:QI 273)
> (const_int 0 [0]))) "pbeampp.c":42 -1
> (nil))
> (jump_insn 180 179 587 23 (set (pc)
> (if_then_else (eq (reg:CCZ 17 flags)
> (const_int 0 [0]))
> (label_ref 196)
> (pc))) "pbeampp.c":42 -1
> (int_list:REG_BR_PROB 3300 (nil))
> -> 196)
>
> that is, it compares in a sensible order allowing for combining (which
> appearantly is what causes the code to run slower for not yet explored
> reasons).
>
> Expanding the other way around does not have any justification IMHO and thus
> the "fix" would be to the later stage where we combine the compare with the
> one on the backedge.
>
> The issue is CSE2 which does
>
> (insn 167 166 168 21 (set (reg:CC 17 flags)
> (compare:CC (reg/v:DI 217 [ red_cost ])
> (const_int 0 [0]))) "pbeampp.c":42 8 {*cmpdi_1}
> (nil))
> (jump_insn 168 167 169 21 (set (pc)
> (if_then_else (ge (reg:CC 17 flags)
> (const_int 0 [0]))
> (label_ref 175)
> (pc))) "pbeampp.c":42 635 {*jcc_1}
> (expr_list:REG_DEAD (reg:CC 17 flags)
> (int_list:REG_BR_PROB 6400 (nil))) -> 175) ...
> (insn 178 176 179 24 (set (reg:QI 273)
> (gt:QI (reg:CC 17 flags)
> (const_int 0 [0]))) "pbeampp.c":42 631 {*setcc_qi}
> (expr_list:REG_DEAD (reg:CC 17 flags)
> (nil)))
>
> thus changes the earlier compare to CC and re-uses that CCmode. Note it's
> still
> a mystery to me why this is slower (and I did not reproduce that myself yet).
>
> Then we combine it to
>
> (insn 167 166 168 18 (set (reg:CC 17 flags)
> (compare:CC (reg/v:DI 217 [ red_cost ])
> (const_int 0 [0]))) "pbeampp.c":42 8 {*cmpdi_1}
> (nil))
> (jump_insn 168 167 169 18 (set (pc)
> (if_then_else (ge (reg:CC 17 flags)
> (const_int 0 [0]))
> (label_ref 175)
> (pc))) "pbeampp.c":42 635 {*jcc_1}
> (int_list:REG_BR_PROB 6400 (nil))
> -> 175)
> ;; succ: 19 [36.0%] (FALLTHRU)
> ;; 20 [64.0%]
>
>
> ;; basic block 20, loop depth 0, count 0, freq 1067, maybe hot ;; Invalid sum
> of
> incoming frequencies 1216, should be 1067 (jump_insn 180 179 587 20 (set (pc)
> (if_then_else (le (reg:CC 17 flags)
> (const_int 0 [0]))
> (label_ref:DI 196)
> (pc))) "pbeampp.c":42 635 {*jcc_1}
> (int_list:REG_BR_PROB 3300 (expr_list:REG_DEAD (reg:CCZ 17 flags)
> (nil)))
>
> --
> You are receiving this mail because:
> You reported the bug.