Hi all, In this PR late-combine was failing to merge: dup v31.4s, v31.s[3] fmla v30.4s, v31.4s, v29.4s into the lane-wise fmla form. This is because late-combine checks may_trap_p under the hood on the dup insn. This ended up returning true for the insn: (set (reg:V4SF 152 [ _32 ]) (vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ]) (parallel:V4SF [ (const_int 3 [0x3])]))))
Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of floating-point modes can't trap, it assumed that the V4SF parallel can trap. The correct behaviour is to recurse into vector inside the PARALLEL and check the sub-expression. This patch adjusts may_trap_p_1 to do just that. With this check the above insn is not deemed to be trapping and is propagated into the FMLA giving: fmla vD.4s, vA.4s, vB.s[3] The testcase is reduced from a larger example so it has some C++ cruft around the main part but still demonstrates the desired effect. Bootstrapped and tested on aarch64-none-linux-gnu. Apparently this also fixes a regression in gcc.target/aarch64/vmul_element_cost.c that I observed. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> gcc/ PR rtl-optimization/119046 * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping. gcc/testsuite/ PR rtl-optimization/119046 * g++.target/aarch64/pr119046.C: New test.
0001-PR-rtl-optimization-119046-Don-t-mark-PARALLEL-RTXes.patch
Description: 0001-PR-rtl-optimization-119046-Don-t-mark-PARALLEL-RTXes.patch