[Bug rtl-optimization/119046] [15 Regression] Performance drop from not forming lane-wise FMLAs with Eigen library

cvs-commit at gcc dot gnu.org via Gcc-bugs Wed, 05 Mar 2025 07:23:27 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119046


--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kyrylo Tkachov <[email protected]>:

https://gcc.gnu.org/g:db76482175c4e76db273d7fb3a00ae0f932529a6

commit r15-7832-gdb76482175c4e76db273d7fb3a00ae0f932529a6
Author: Kyrylo Tkachov <[email protected]>
Date:   Thu Feb 27 09:00:25 2025 -0800

    PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point
mode as trapping

    In this testcase late-combine was failing to merge:
            dup     v31.4s, v31.s[3]
            fmla    v30.4s, v31.4s, v29.4s
    into the lane-wise fmla form.
    This is because late-combine checks may_trap_p under the hood on the dup
insn.
    This ended up returning true for the insn:
    (set (reg:V4SF 152 [ _32 ])
            (vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31
])
                    (parallel:V4SF [
                            (const_int 3 [0x3])]))))

    Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of
    floating-point modes can't trap, it assumed that the V4SF parallel can
trap.
    The correct behaviour is to recurse into vector inside the PARALLEL and
check
    the sub-expression.  This patch adjusts may_trap_p_1 to do just that.
    With this check the above insn is not deemed to be trapping and is
propagated
    into the FMLA giving:
            fmla    vD.4s, vA.4s, vB.s[3]

    Bootstrapped and tested on aarch64-none-linux-gnu.
    Apparently this also fixes a regression in
    gcc.target/aarch64/vmul_element_cost.c that I observed.

    Signed-off-by: Kyrylo Tkachov <[email protected]>

    gcc/

            PR rtl-optimization/119046
            * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as
trapping.

    gcc/testsuite/

            PR rtl-optimization/119046
            * gcc.target/aarch64/pr119046.c: New test.

[Bug rtl-optimization/119046] [15 Regression] Performance drop from not forming lane-wise FMLAs with Eigen library

Reply via email to