> IMHO this is how it should roughly look like:
> With GR2VR=2:
> vadd.vv: cost 4 = COST_N_INSNS (1)
> vmv.v.x: cost COST_N_INSNS (GR2VR) = 8
> vadd.vx: cost 4 + GR2VR * COST_N_INSNS (1) = 12
> With GR2VR=1:
> vadd.vv: cost 4
> vmv.v.x: cost 4
> vadd.vx: cost 4 + 4 = 8
> With GR2VR=0:
> vadd.vv: cost 4
> vmv.v.x: cost 4 (or less?)
> vadd.vx: cost 4 + 0 * COST_N_INSNS (1) = 4
> So with GR2VR > 0 we would perform the replacement when the frequency is
> similar. With GR2VR == 0 we should always do.
> vmv.v.x cost 4 with GR2VR cost == 0 is a bit debatable but setting it to 0
> would also seem off.
Make sense to me, it looks like the combine will always take place if GR2VR is
0, 1 or 2 for now.
I am try to customize the cost here to make it fail to combine but get failed
with below change.
+ if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) {
+ cost_val = 1;
+ }
+
+ if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
+ && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 1)))) {
+ cost_val = 8;
+ }
+
+ if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
+ && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 1)))) {
+ cost_val = 2; // never picked up during combine.
+ }
It takes 8 for original cost as well as replacement(see below combine log).
Thus, it will be always
keep replacement during combine.
51 │ trying to combine definition of r135 in:
52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI)
53 │ into:
54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
55 │ REG_DEAD r146:RVVM1DI
56 │ successfully matched this instruction to *add_vx_rvvm1di:
57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
59 │ (reg:RVVM1DI 146)))
60 │ original cost = 4 + 32 (weighted: 262.469092), replacement cost = 32
(weighted: 258.909092); keeping replacement
61 │ rescanning insn with uid = 18.
62 │ updating insn 18 in-place
63 │ verify found no changes in insn with uid = 18.
64 │ deleting insn 11
65 │ deleting insn with uid = 11.
Based on above, I have another try to understand how late-combine leverage the
RTX_COST.
Aka, set vadd v1, (vec_dup(x1)) to 8 and others to 1.
+ if (rcode == PLUS) {
+ rtx arg0 = XEXP (x, 0);
+ rtx arg1 = XEXP (x, 1);
+
+ if (riscv_v_ext_mode_p (GET_MODE (arg1))
+ && GET_CODE (arg0) == VEC_DUPLICATE) {
+ cost_val = 8;
+ }
+ }
Then the late-combine reject the replacement as expected. Thus, the condition
failed to combine may
Looks like vmv.vx + vadd.vv < vadd.vx here if my understanding is correct. If
so, it will also impact the
--param we would like to introduce, a single --param=gr2vr_cost=XXX is not good
enough to make sure that
the condition is true, we may need --param=vv_cost/vx_cost=XXX.
Btw, is there any approach to set the cost attached to the
define_insn_and_split? Which may be more
friendly to catch it from RTX_COST up to a point.
51 │ trying to combine definition of r135 in:
52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI)
53 │ into:
54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
55 │ REG_DEAD r146:RVVM1DI
56 │ successfully matched this instruction to *add_vx_rvvm1di:
57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
59 │ (reg:RVVM1DI 146)))
60 │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 32
(weighted: 258.909092); rejecting replacement
61 │
Pan
-----Original Message-----
From: Robin Dapp <[email protected]>
Sent: Thursday, April 24, 2025 8:13 PM
To: Li, Pan2 <[email protected]>; Robin Dapp <[email protected]>;
[email protected]
Cc: [email protected]; [email protected]; [email protected]; Chen,
Ken <[email protected]>; Liu, Hongtao <[email protected]>; Robin Dapp
<[email protected]>
Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx
on GR2VR cost
>> Ah, I see, thanks. So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4
>> while vadd.vx costs 1 + 2, making it cheaper?
>
> Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff
> cost here to see
> the impact to late-combine.
>
> + if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) {
> + cost_val = get_vector_costs ()->regmove->GR2VR;
> + }
>
> ---- cut line ----
>
> If GR2VR is 2, we will perform the combine as below.
>
> 51 trying to combine definition of r135 in:
> 52 11: r135:RVVM1DI=vec_duplicate(r150:DI)
> 53 into:
> 54 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
> 55 REG_DEAD r146:RVVM1DI
> 56 successfully matched this instruction to *add_vx_rvvm1di:
> 57 (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
> 58 (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
> 59 (reg:RVVM1DI 146)))
> 60 original cost = 8 + 4 (weighted: 39.483637), replacement cost = 4
> (weighted: 32.363637); keeping replacement
> 61 rescanning insn with uid = 18.
> 62 updating insn 18 in-place
> 63 verify found no changes in insn with uid = 18.
> 64 deleting insn 11
> 65 deleting insn with uid = 11.
>
> ---- cut line ----
>
> If GR2VR is 1, we will perform the combine as below.
>
> 51 │ trying to combine definition of r135 in:
> 52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI)
> 53 │ into:
> 54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
> 55 │ REG_DEAD r146:RVVM1DI
> 56 │ successfully matched this instruction to *add_vx_rvvm1di:
> 57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
> 58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
> 59 │ (reg:RVVM1DI 146)))
> 60 │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 4
> (weighted: 32.363637); keeping replacement
> 61 │ rescanning insn with uid = 18.
> 62 │ updating insn 18 in-place
> 63 │ verify found no changes in insn with uid = 18.
> 64 │ deleting insn 11
> 65 │ deleting insn with uid = 11.
>
> ---- cut line ----
>
> If GR2VR is 0, it will be normalized to 1 as below, thus the combine log
> looks like the same as above.
IMHO this is how it should roughly look like:
With GR2VR=2:
vadd.vv: cost 4 = COST_N_INSNS (1)
vmv.v.x: cost COST_N_INSNS (GR2VR) = 8
vadd.vx: cost 4 + GR2VR * COST_N_INSNS (1) = 12
With GR2VR=1:
vadd.vv: cost 4
vmv.v.x: cost 4
vadd.vx: cost 4 + 4 = 8
With GR2VR=0:
vadd.vv: cost 4
vmv.v.x: cost 4 (or less?)
vadd.vx: cost 4 + 0 * COST_N_INSNS (1) = 4
So with GR2VR > 0 we would perform the replacement when the frequency is
similar. With GR2VR == 0 we should always do.
vmv.v.x cost 4 with GR2VR cost == 0 is a bit debatable but setting it to 0
would also seem off.
--
Regards
Robin