Make sense to me, it looks like the combine will always take place if GR2VR is 0, 1 or 2 for now.
I am try to customize the cost here to make it fail to combine but get failed 
with below change.

+  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) {
+    cost_val = 1;
+  }
+
+  if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
+      && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 1)))) {
+    cost_val = 8;
+  }
+
+  if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
+      && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 1)))) {
+    cost_val = 2; // never picked up during combine.
+  }

I think this slightly is too simple (as you also showed) due to COST_N_INSNS (0) == 4. We need to make sure to match the full patterns and then set their costs. There must be three distinct costing paths: vadd.vv, vadd.vx and vmv.vx.


It takes 8 for original cost as well as replacement(see below combine log). 
Thus, it will be always
keep replacement during combine.
  51   │ trying to combine definition of r135 in:
  52   │    11: r135:RVVM1DI=vec_duplicate(r150:DI)
  53   │ into:
  54   │    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
  55   │       REG_DEAD r146:RVVM1DI
  56   │ successfully matched this instruction to *add_vx_rvvm1di:
  57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
  58   │     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
  59   │         (reg:RVVM1DI 146)))
  60   │ original cost = 4 + 32 (weighted: 262.469092), replacement cost = 32 
(weighted: 258.909092); keeping replacement
  61   │ rescanning insn with uid = 18.
  62   │ updating insn 18 in-place
  63   │ verify found no changes in insn with uid = 18.
  64   │ deleting insn 11
  65   │ deleting insn with uid = 11.

Based on above, I have another try to understand how late-combine leverage the 
RTX_COST.
Aka, set vadd v1, (vec_dup(x1)) to 8 and others to 1.

+  if (rcode == PLUS) {
+    rtx arg0 = XEXP (x, 0);
+    rtx arg1 = XEXP (x, 1);
+
+    if (riscv_v_ext_mode_p (GET_MODE (arg1))
+       && GET_CODE (arg0) == VEC_DUPLICATE) {
+       cost_val = 8;
+    }
+  }

Then the late-combine reject the replacement as expected. Thus, the condition 
failed to combine may
Looks like vmv.vx + vadd.vv < vadd.vx here if my understanding is correct.  If 
so, it will also impact the
--param we would like to introduce, a single --param=gr2vr_cost=XXX is not good 
enough to make sure that
the condition is true, we may need --param=vv_cost/vx_cost=XXX.

Yes, at first we must deconstruct all relevant patterns as above for PLUS.
The basic cost for the add is COST_N_INSNS (1) == 4. If one operand is a VEC_DUPLICATE then we increase the basic cost by GR2VR * COST_N_INSNS (1). Is that not sufficient for the combination to not happen?

If we have
 for (...)
   {
     vmv.vx
     vadd.vv
   }

then this should be combined even if COST (vmv.vx) + COST (vadd.vv) == COST (vadd.vx) because we save an instruction and need to perform the broadcast anyway.

For
 vmv.vx
 for (...)
   {
     vadd.vv
   }

the combination should not take place (when the costs are equal) because of the frequency consideration in late-combine's costing.
When COST (vmv.vx) + COST (vadd.vv) > COST (vadd.vx) it should take place.

We first need to get these basic building blocks correct before considering something else.

Btw, is there any approach to set the cost attached to the define_insn_and_split? Which may be more
friendly to catch it from RTX_COST up to a point.

  51   │ trying to combine definition of r135 in:
  52   │    11: r135:RVVM1DI=vec_duplicate(r150:DI)
  53   │ into:
  54   │    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
  55   │       REG_DEAD r146:RVVM1DI
  56   │ successfully matched this instruction to *add_vx_rvvm1di:
  57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
  58   │     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
  59   │         (reg:RVVM1DI 146)))
  60   │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 32 
(weighted: 258.909092); rejecting replacement
  61   │

In the end we'll have to capture all patterns (predicated and unpredicated) anyway so just tackling the unsplit ones only helps so much.

--
Regards
Robin

Reply via email to