https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Di Zhao <dz...@gcc.gnu.org>:

https://gcc.gnu.org/g:8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652

commit r14-6559-g8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652
Author: Di Zhao <diz...@os.amperecomputing.com>
Date:   Fri Dec 15 03:22:32 2023 +0800

    Consider fully pipelined FMA in get_reassociation_width

    Add a new parameter param_fully_pipelined_fma. If it is non-zero,
    reassociation considers the benefit of parallelizing FMA's
    multiplication part and addition part, assuming FMUL and FMA use the
    same units that can also do FADD.

    With the patch and new option, there's ~2% improvement in spec2017
    508.namd on AmpereOne. (The other options are "-Ofast -mcpu=ampere1
     -flto".)

            PR tree-optimization/110279

    gcc/ChangeLog:

            * doc/invoke.texi: New parameter fully-pipelined-fma.
            * params.opt: New parameter fully-pipelined-fma.
            * tree-ssa-reassoc.cc (get_mult_latency_consider_fma): Return
            the latency of MULT_EXPRs that can't be hidden by the FMAs.
            (get_reassociation_width): Search for a smaller width
            considering the benefit of fully pipelined FMA.
            (rank_ops_for_fma): Return the number of MULT_EXPRs.
            (reassociate_bb): Pass the number of MULT_EXPRs to
            get_reassociation_width; avoid calling
            get_reassociation_width twice.

    gcc/testsuite/ChangeLog:

            * gcc.dg/pr110279-2.c: New test.

Reply via email to