https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114238
Bug ID: 114238
Summary: Multiple 554.roms_r run-time regressions (4%-20%)
since r14-9193-ga0b1798042d033
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jamborm at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
Blocks: 26163
Target Milestone: ---
Host: x86_64-linux, aarch64-linux
Target: x86_64-linux, aarch64-linux
Our LNT instance has detected that runtime of benchmark 554.roms_r
from the SPEC 2017 FPUrate suite regressed on all machines on most
configurations by 4-20%.
For example:
simple -O2 -flto on AMD Zen 3 regressed by 14%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=470.537.0
on Zen2 -O2 -flto regression is the worst, 20%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=298.537.0
-Ofast -march=native -flto on AMD Zen 4 regressed by 7%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=959.537.0
-Ofast -march=native on AMD Zen 2 regressed by 17%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.537.0
but it also happens on Intel Skylake:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=800.537.0
or Aarch64:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=587.537.0
and there are smaller regressions on the PGO configurations too.
I have bisected the Zen3 -O2 -flto case to r14-9193-ga0b1798042d033
(Richard Biener: tree-optimization/114074 - CHREC multiplication and
undefined overflow). I have then verified that the zen 4 -Ofast
-march=natice -flto and zen 2 -Ofast -march=native cases have also
been introduces by it:
commit a0b1798042d033fd2cc2c806afbb77875dd2909b
Author: Richard Biener <[email protected]>
Date: Mon Feb 26 13:33:21 2024 +0100
tree-optimization/114074 - CHREC multiplication and undefined overflow
When folding a multiply CHRECs are handled like {a, +, b} * c
is {a*c, +, b*c} but that isn't generally correct when overflow
invokes undefined behavior. The following uses unsigned arithmetic
unless either a is zero or a and b have the same sign.
I've used simple early outs for INTEGER_CSTs and otherwise use
a range-query since we lack a tree_expr_nonpositive_p and
get_range_pos_neg isn't a good fit.
PR tree-optimization/114074
* tree-chrec.h (chrec_convert_rhs): Default at_stmt arg to NULL.
* tree-chrec.cc (chrec_fold_multiply): Canonicalize inputs.
Handle poly vs. non-poly multiplication correctly with respect
to undefined behavior on overflow.
* gcc.dg/torture/pr114074.c: New testcase.
* gcc.dg/pr68317.c: Adjust expected location of diagnostic.
* gcc.dg/vect/vect-early-break_119-pr114068.c: Do not expect
loop to be vectorized.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)