https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119100
--- Comment #15 from Paul-Antoine Arras <parras at gcc dot gnu.org> --- https://godbolt.org/z/dczr15Eo4 Reduced from 538.imagick We get the following assembly: fld fa5,0(a4) vfmv.v.f v2,fa5 vfmacc.vv v1,v3,v2 But since r16-1659-g92e1893e0155b6 we should get: fld fa5,0(a4) vfmacc.vf v1,fa5,v2 The combine dump shows: Trying 48 -> 53: 48: r160:DF=[r140:DI] 53: r190:RVVM1DF=vec_duplicate(r160:DF) REG_DEAD r160:DF Successfully matched this instruction: (set (reg:RVVM1DF 190 [ vect__7.24_89 ]) (vec_duplicate:RVVM1DF (mem:DF (reg:DI 140 [ ivtmp.56 ]) [0 MEM[(const double *)_14]+0 S8 A64]))) allowing combination of insns 48 and 53 original costs 28 + 8 = 36 replacement cost 8 deferring deletion of insn with uid = 48. modifying insn i3 53: r190:RVVM1DF=vec_duplicate([r140:DI]) deferring rescan insn with uid = 53. Trying 53 -> 54: 53: r190:RVVM1DF=vec_duplicate([r140:DI]) 54: r154:RVVM1DF=r189:RVVM1DF*r190:RVVM1DF+r154:RVVM1DF REG_DEAD r190:RVVM1DF REG_DEAD r189:RVVM1DF Failed to match this instruction: (set (reg:RVVM1DF 154 [ vect_result$red_108.39 ]) (plus:RVVM1DF (mult:RVVM1DF (vec_duplicate:RVVM1DF (mem:DF (reg:DI 140 [ ivtmp.56 ]) [0 MEM[(const double *)_14]+0 S8 A64])) (reg:RVVM1DF 189 [ vect__28.29_84 ])) (reg:RVVM1DF 154 [ vect_result$red_108.39 ]))) So the first combination (48 -> 53) folds a memory reference into a vec_duplicate. As a result, the second combination (53 -> 54) fails due to the folded memory reference. Is there a way to defer the mem folding to late_combine? *** Additionally, the replacement cost of (48 -> 53) is equal to 8 which is much less than 28 + 8 = 36. This is probably wrong, even for vlse -- which is not used here anyway. Something along these lines would certainly give a better cost estimate: --- gcc/config/riscv/riscv.cc +++ gcc/config/riscv/riscv.cc @@ -3999,7 +3999,14 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN switch (GET_CODE (x)) { case VEC_DUPLICATE: - *total = gr2vr_cost * COSTS_N_INSNS (1); + if (MEM_P (XEXP (x, 0))) + { + riscv_rtx_costs (XEXP (x, 0), GET_MODE_INNER (mode), + VEC_DUPLICATE, opno, total, speed); + *total += scalar2vr_cost * COSTS_N_INSNS (1); + } + else + *total = scalar2vr_cost * COSTS_N_INSNS (1); break; case IF_THEN_ELSE: { But that is not enough to prevent the replacement.