https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279
--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Di Zhao <dz...@gcc.gnu.org>: https://gcc.gnu.org/g:8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652 commit r14-6559-g8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652 Author: Di Zhao <diz...@os.amperecomputing.com> Date: Fri Dec 15 03:22:32 2023 +0800 Consider fully pipelined FMA in get_reassociation_width Add a new parameter param_fully_pipelined_fma. If it is non-zero, reassociation considers the benefit of parallelizing FMA's multiplication part and addition part, assuming FMUL and FMA use the same units that can also do FADD. With the patch and new option, there's ~2% improvement in spec2017 508.namd on AmpereOne. (The other options are "-Ofast -mcpu=ampere1 -flto".) PR tree-optimization/110279 gcc/ChangeLog: * doc/invoke.texi: New parameter fully-pipelined-fma. * params.opt: New parameter fully-pipelined-fma. * tree-ssa-reassoc.cc (get_mult_latency_consider_fma): Return the latency of MULT_EXPRs that can't be hidden by the FMAs. (get_reassociation_width): Search for a smaller width considering the benefit of fully pipelined FMA. (rank_ops_for_fma): Return the number of MULT_EXPRs. (reassociate_bb): Pass the number of MULT_EXPRs to get_reassociation_width; avoid calling get_reassociation_width twice. gcc/testsuite/ChangeLog: * gcc.dg/pr110279-2.c: New test.