https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119100

--- Comment #15 from Paul-Antoine Arras <parras at gcc dot gnu.org> ---
https://godbolt.org/z/dczr15Eo4
Reduced from 538.imagick

We get the following assembly:

fld     fa5,0(a4)
vfmv.v.f        v2,fa5
vfmacc.vv       v1,v3,v2

But since r16-1659-g92e1893e0155b6 we should get:

fld     fa5,0(a4)
vfmacc.vf       v1,fa5,v2


The combine dump shows:

Trying 48 -> 53:
   48: r160:DF=[r140:DI]
   53: r190:RVVM1DF=vec_duplicate(r160:DF)
      REG_DEAD r160:DF
Successfully matched this instruction:
(set (reg:RVVM1DF 190 [ vect__7.24_89 ])
    (vec_duplicate:RVVM1DF (mem:DF (reg:DI 140 [ ivtmp.56 ]) [0 MEM[(const
double *)_14]+0 S8 A64])))
allowing combination of insns 48 and 53
original costs 28 + 8 = 36
replacement cost 8
deferring deletion of insn with uid = 48.
modifying insn i3    53: r190:RVVM1DF=vec_duplicate([r140:DI])
deferring rescan insn with uid = 53.

Trying 53 -> 54:
   53: r190:RVVM1DF=vec_duplicate([r140:DI])
   54: r154:RVVM1DF=r189:RVVM1DF*r190:RVVM1DF+r154:RVVM1DF
      REG_DEAD r190:RVVM1DF
      REG_DEAD r189:RVVM1DF
Failed to match this instruction:
(set (reg:RVVM1DF 154 [ vect_result$red_108.39 ])
    (plus:RVVM1DF (mult:RVVM1DF (vec_duplicate:RVVM1DF (mem:DF (reg:DI 140 [
ivtmp.56 ]) [0 MEM[(const double *)_14]+0 S8 A64]))
            (reg:RVVM1DF 189 [ vect__28.29_84 ]))
        (reg:RVVM1DF 154 [ vect_result$red_108.39 ])))



So the first combination (48 -> 53) folds a memory reference into a
vec_duplicate. As a result, the second combination (53 -> 54) fails due to the
folded memory reference.

Is there a way to defer the mem folding to late_combine?

***

Additionally, the replacement cost of (48 -> 53) is equal to 8 which is much
less than 28 + 8 = 36. This is probably wrong, even for vlse -- which is not
used here anyway.

Something along these lines would certainly give a better cost estimate:

--- gcc/config/riscv/riscv.cc
+++ gcc/config/riscv/riscv.cc
@@ -3999,7 +3999,14 @@ riscv_rtx_costs (rtx x, machine_mode mode, int
outer_code, int opno ATTRIBUTE_UN
            switch (GET_CODE (x))
              {
              case VEC_DUPLICATE:
-               *total = gr2vr_cost * COSTS_N_INSNS (1);
+               if (MEM_P (XEXP (x, 0)))
+                 {
+                   riscv_rtx_costs (XEXP (x, 0), GET_MODE_INNER (mode),
+                                    VEC_DUPLICATE, opno, total, speed);
+                   *total += scalar2vr_cost * COSTS_N_INSNS (1);
+                 }
+               else
+                 *total = scalar2vr_cost * COSTS_N_INSNS (1);
                break;
              case IF_THEN_ELSE:
                { 

But that is not enough to prevent the replacement.

Reply via email to