https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115144
Bug ID: 115144 Summary: [15 Regression] 2% performance regression for some codes with r15-518-g99b1daae18c095 Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hp at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target Milestone: --- Target: cris-elf ...and also, regresses gcc.target/cris/pr93372-47.c. The actual purpose of that test-case is as a regression-test for a fixed bug with delay-slot-filling, but it also serves as a guard against code quality regression. Following up as per the comment in pr93372-47.c about what to investigate in case it regressed, I see a quite large regression: The commit r15-518-g99b1daae18c095 "tree-optimization/114589 - remove profile based sink heuristics" caused an almost 2% performance regression for certain codes, as measured by simulator output by executing gcc.c-torture/execute/arith-rand-ll.c compiled for cris-elf with -O2 -march=v10. r15-0517: Basic clock cycles, total @: 13025734 r15-0518: Basic clock cycles, total @: 13279004 Also, I inspected simulator output and the bulk is indeed in random_bitstring (i.e. not in div and mod library functions). Perhaps you say that ivopts matters here? The same, adding -fno-ivopts, r15-0517: Basic clock cycles, total @: 13008338 r15-0518: Basic clock cycles, total @: 13330520 ...so the regression is then even larger; almost 2.5%. It may be argued that arith-rand-ll.c is not a reliable performance test, so I also ran r15-0517 and r15-0518 by coremark, which paints a different picture: r15-0517: Basic clock cycles, total @: 5022704 r15-0518: Basic clock cycles, total @: 5021785 So there, it's a win in performance, if only small (~0.02%). Same, with -fno-ivopts: r15-0517: Basic clock cycles, total @: 5641650 r15-0518: Basic clock cycles, total @: 5640721 Still a win in performance, only smaller (still ~0.02%). Judging from coremark, there's no general conclusion regarding performance of r15-518, but I know from other performance investigations that "double register"-heavy code such as arith-rand-ll.c for CRIS has different characteristics than other test-code, here coremark. Maybe something can be done to improve on r15-518 for this type of code or maybe it exposed problems for other ports, so I'm not going to immediately myself close this as WONTFIX. I'll also be using this PR as an anchor when dealing with (likely xfailing) the regression for gcc.target/cris/pr93372-47.c.