> IMHO this is how it should roughly look like:

> With GR2VR=2:
> vadd.vv: cost 4 = COST_N_INSNS (1)
> vmv.v.x: cost COST_N_INSNS (GR2VR) = 8
> vadd.vx: cost 4 + GR2VR * COST_N_INSNS (1) = 12

> With GR2VR=1:
> vadd.vv: cost 4
> vmv.v.x: cost 4
> vadd.vx: cost 4 + 4 = 8

> With GR2VR=0:
> vadd.vv: cost 4
> vmv.v.x: cost 4 (or less?)
> vadd.vx: cost 4 + 0 * COST_N_INSNS (1) = 4

> So with GR2VR > 0 we would perform the replacement when the frequency is 
> similar.  With GR2VR == 0 we should always do.

> vmv.v.x cost 4 with GR2VR cost == 0 is a bit debatable but setting it to 0 
> would also seem off.

Make sense to me, it looks like the combine will always take place if GR2VR is 
0, 1 or 2 for now.
I am try to customize the cost here to make it fail to combine but get failed 
with below change.

+  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) {
+    cost_val = 1;
+  }
+
+  if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
+      && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 1)))) {
+    cost_val = 8;
+  }
+
+  if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
+      && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 1)))) {
+    cost_val = 2; // never picked up during combine.
+  }

It takes 8 for original cost as well as replacement(see below combine log). 
Thus, it will be always
keep replacement during combine. 

  51   │ trying to combine definition of r135 in:
  52   │    11: r135:RVVM1DI=vec_duplicate(r150:DI)
  53   │ into:
  54   │    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
  55   │       REG_DEAD r146:RVVM1DI
  56   │ successfully matched this instruction to *add_vx_rvvm1di:
  57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
  58   │     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
  59   │         (reg:RVVM1DI 146)))
  60   │ original cost = 4 + 32 (weighted: 262.469092), replacement cost = 32 
(weighted: 258.909092); keeping replacement
  61   │ rescanning insn with uid = 18.
  62   │ updating insn 18 in-place
  63   │ verify found no changes in insn with uid = 18.
  64   │ deleting insn 11
  65   │ deleting insn with uid = 11.

Based on above, I have another try to understand how late-combine leverage the 
RTX_COST.
Aka, set vadd v1, (vec_dup(x1)) to 8 and others to 1.

+  if (rcode == PLUS) {
+    rtx arg0 = XEXP (x, 0);
+    rtx arg1 = XEXP (x, 1);
+
+    if (riscv_v_ext_mode_p (GET_MODE (arg1))
+       && GET_CODE (arg0) == VEC_DUPLICATE) {
+       cost_val = 8;
+    }
+  }

Then the late-combine reject the replacement as expected. Thus, the condition 
failed to combine may
Looks like vmv.vx + vadd.vv < vadd.vx here if my understanding is correct.  If 
so, it will also impact the
--param we would like to introduce, a single --param=gr2vr_cost=XXX is not good 
enough to make sure that
the condition is true, we may need --param=vv_cost/vx_cost=XXX.

Btw, is there any approach to set the cost attached to the 
define_insn_and_split? Which may be more
friendly to catch it from RTX_COST up to a point.

  51   │ trying to combine definition of r135 in:
  52   │    11: r135:RVVM1DI=vec_duplicate(r150:DI)
  53   │ into:
  54   │    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
  55   │       REG_DEAD r146:RVVM1DI
  56   │ successfully matched this instruction to *add_vx_rvvm1di:
  57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
  58   │     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
  59   │         (reg:RVVM1DI 146)))
  60   │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 32 
(weighted: 258.909092); rejecting replacement
  61   │

Pan


-----Original Message-----
From: Robin Dapp <rdapp....@gmail.com> 
Sent: Thursday, April 24, 2025 8:13 PM
To: Li, Pan2 <pan2...@intel.com>; Robin Dapp <rdapp....@gmail.com>; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, 
Ken <ken.c...@intel.com>; Liu, Hongtao <hongtao....@intel.com>; Robin Dapp 
<rdapp....@gmail.com>
Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx 
on GR2VR cost

>> Ah, I see, thanks.  So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4 
>> while vadd.vx costs 1 + 2, making it cheaper?
>
> Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff 
> cost here to see
> the impact to late-combine.
>
> +  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) {
> +    cost_val = get_vector_costs ()->regmove->GR2VR;
> +  }
>
> ---- cut line ----
>
> If GR2VR is 2, we will perform the combine as below.
>
>  51 trying to combine definition of r135 in:
>  52    11: r135:RVVM1DI=vec_duplicate(r150:DI)
>  53 into:
>  54    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
>  55       REG_DEAD r146:RVVM1DI
>  56 successfully matched this instruction to *add_vx_rvvm1di:
>  57 (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
>  58     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
>  59         (reg:RVVM1DI 146)))
>  60 original cost = 8 + 4 (weighted: 39.483637), replacement cost = 4 
> (weighted: 32.363637); keeping replacement
>  61 rescanning insn with uid = 18.
>  62 updating insn 18 in-place
>  63 verify found no changes in insn with uid = 18.
>  64 deleting insn 11
>  65 deleting insn with uid = 11.
>
> ---- cut line ----
>
> If GR2VR is 1, we will perform the combine as below.
>
>   51   │ trying to combine definition of r135 in:
>   52   │    11: r135:RVVM1DI=vec_duplicate(r150:DI)
>   53   │ into:
>   54   │    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
>   55   │       REG_DEAD r146:RVVM1DI
>   56   │ successfully matched this instruction to *add_vx_rvvm1di:
>   57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
>   58   │     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
>   59   │         (reg:RVVM1DI 146)))
>   60   │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 4 
> (weighted: 32.363637); keeping replacement
>   61   │ rescanning insn with uid = 18.
>   62   │ updating insn 18 in-place
>   63   │ verify found no changes in insn with uid = 18.
>   64   │ deleting insn 11
>   65   │ deleting insn with uid = 11.
>
> ---- cut line ----
>
> If GR2VR is 0, it will be normalized to 1 as below, thus the combine log 
> looks like the same as above.

IMHO this is how it should roughly look like:

With GR2VR=2:
vadd.vv: cost 4 = COST_N_INSNS (1)
vmv.v.x: cost COST_N_INSNS (GR2VR) = 8
vadd.vx: cost 4 + GR2VR * COST_N_INSNS (1) = 12

With GR2VR=1:
vadd.vv: cost 4
vmv.v.x: cost 4
vadd.vx: cost 4 + 4 = 8

With GR2VR=0:
vadd.vv: cost 4
vmv.v.x: cost 4 (or less?)
vadd.vx: cost 4 + 0 * COST_N_INSNS (1) = 4

So with GR2VR > 0 we would perform the replacement when the frequency is 
similar.  With GR2VR == 0 we should always do.

vmv.v.x cost 4 with GR2VR cost == 0 is a bit debatable but setting it to 0 
would also seem off.

-- 
Regards
 Robin

Reply via email to