https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84201
john henning <mailboxnotfound at yahoo dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mailboxnotfound at yahoo dot com --- Comment #12 from john henning <mailboxnotfound at yahoo dot com> --- I contributed to the development of benchmark 549.fotonik3d_r. The opinions herein are my own, not necessarily SPEC's. Martin Liška wrote in comment 9: > adjusted tolerance for the test from 1e-10 to 1e-9 That change would have been highly desirable, if this problem had been found prior to the release of CPU 2017. Unfortunately, post-release, it is very difficult to change a SPEC CPU benchmark, because of the philosophy of "no moving targets". To be clear, a rule-compliant SPEC CPU run is not allowed to change the tolerance. Why wasn't the problem found before release? Although GCC was tested prior to release of CPU 2017, the circumstances that lead to this problem were not encountered. As Steve Ellcey wrote in Comment 5, the problem comes and goes with various optimizations: > -Ofast fails > -Ofast -fno-unsafe-math-optimizations works > -Ofast -fno-tree-loop-vectorize works > -O3 works In addition, it appears that the problem: - Depends on the particular architecture. In my tests today, it disappears when I remove -march=native - Does not happen in the presence of FDO (Feedback-Directed Optimization, i.e. -fprofile-generate / -fprofile-use), typically used by "peak" tuning (see https://www.spec.org/cpu2017/Docs/overview.html#Q16 for info on base vs. peak). SPEC CPU Examples SPEC CPU 2017 users who start from the Example config files for GCC in $SPEC/config are unlikely to hit the problem because most of the GCC Example config files use -O3 (not -Ofast) for base. However, if a user modifies the example to use -Ofast -march=native then they will need to also add -fno-unsafe-math-optimizations or -fno-tree-loop-vectorize. Working around the problem The same-for-all rule -- https://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags -- says that all benchmarks in a suite of a given language must use the same base flags. Here are several examples of how a config file could obey that rule while working around the problem: Option (a) - In base, avoid Ofast for Fortran default=base: COPTIMIZE = -Ofast -flto -march=native CXXOPTIMIZE = -Ofast -flto -march=native FOPTIMIZE = -O3 -flto -march=native Option (b) - In base, avoid -march=native for Fortran default=base: COPTIMIZE = -Ofast -flto -march=native CXXOPTIMIZE = -Ofast -flto -march=native FOPTIMIZE = -Ofast -flto Option (c) - Turn off tree loop vectorizer for Fortran default=base: OPTIMIZE = -Ofast -flto -march=native fprate,fpspeed=base: FOPTIMIZE = -fno-tree-loop-vectorize Option (d) - Turn off unsafe math for Fortran default=base: OPTIMIZE = -Ofast -flto -march=native fprate,fpspeed=base: FOPTIMIZE = -fno-unsafe-math-optimizations Performance impact The performance impact of the options above will be system dependent, and may depend on how hard you exercise the system (e.g. one copy or many copies). For a particular system tested by one particular person running only one copy, here are the results of the above 4 options, normalized to option (a): Performance of the Fortran benchmarks in SPECrate2017 FP Normalized to option (a). Higher is better 503. 507. 521. 527. 549. 554. bwaves cactu wrf cam4 fotonik roms geomean (a) O3 1.00 1.00 1.00 1.00 1.00 1.00 1.000 (b) no native 1.31 .82 1.31 1.01 .93 .98 1.045 (c) no vect 1.31 1.01 .94 .89 .90 .85 .973 (d) no unsafe .99 1.02 1.36 1.01 1.03 1.01 1.064 Given the above, at the moment option (d) seems best. Next steps I will add a summary of this discussion to https://www.spec.org/cpu2017/Docs/benchmarks/549.fotonik3d_r.html Thank you Martin, Martin, Steve, and Richard for the clarity in this report.