Thanks Robin for comments and help.
> Yes, at first we must deconstruct all relevant patterns as above for PLUS.
> The basic cost for the add is COST_N_INSNS (1) == 4. If one operand is a
> VEC_DUPLICATE then we increase the basic cost by GR2VR * COST_N_INSNS (1).
> Is
> that not sufficient for the combination to not happen?
The below changes will make the late-combine fail to happen, aka
vmv(4) + vadd.vv(4) = 8 < plus (v, vec_dup(x)) (4 + GR2VR * 4 = 4 + 8 = 12)
+ if (riscv_v_ext_mode_p (mode))
+ {
+ cost = 1;
+ rtx op_0 = XEXP (x, 0);
+ rtx op_1 = XEXP (x, 1);
+
+ if (GET_CODE (op_0) == VEC_DUPLICATE
+ || GET_CODE (op_1) == VEC_DUPLICATE)
+ cost += get_vector_costs ()->regmove->GR2VR;
+
+ *total = COSTS_N_INSNS (cost);
+ return true;
+ }
But this is not that good enough here if my understanding is correct.
As vmv.v.x is somehow equivalent to vec_dup but doesn't ref GR2VR,
thus shall we introduce 3 options and enrich the get_vector_costs () instead?
Aka
--param=vv_alu_cost=?
--param=vx_alu_cost=?
--param=xv_mov_cost=?
And then pick up the value directly from the options, or let the option pollute
the get_vector_costs respectively. They should be something similar as blow
in riscv_rtx_costs (take add as example).
case PLUS:
if (op_0 is VEC_DUP or op_1 is VEC_DUP)
cost = get_vector_costs ()->regalu->GR2VR;
if (op_0 is reg && op_1 is reg)
cost = get_vector_costs ()->regalu->VR2VR;
case VEC_DUP:
cost = get_vector_costs ()->regmove->GR2VR;
> If we have
> for (...)
> {
> vmv.vx
> vadd.vv
> }
> then this should be combined even if COST (vmv.vx) + COST (vadd.vv) == COST
> (vadd.vx) because we save an instruction and need to perform the broadcast
> anyway.
> For
> vmv.vx
> for (...)
> {
> vadd.vv
> }
> the combination should not take place (when the costs are equal) because of
> the
> frequency consideration in late-combine's costing.
> When COST (vmv.vx) + COST (vadd.vv) > COST (vadd.vx) it should take place.
I see, will enrich the test cases for above two cases.
Pan
-----Original Message-----
From: Robin Dapp <[email protected]>
Sent: Monday, April 28, 2025 3:39 PM
To: Li, Pan2 <[email protected]>; Robin Dapp <[email protected]>;
[email protected]
Cc: [email protected]; [email protected]; [email protected]; Chen,
Ken <[email protected]>; Liu, Hongtao <[email protected]>; Robin Dapp
<[email protected]>
Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx
on GR2VR cost
> Make sense to me, it looks like the combine will always take place if GR2VR
> is 0, 1 or 2 for now.
> I am try to customize the cost here to make it fail to combine but get failed
> with below change.
>
> + if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) {
> + cost_val = 1;
> + }
> +
> + if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
> + && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 1)))) {
> + cost_val = 8;
> + }
> +
> + if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
> + && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 1)))) {
> + cost_val = 2; // never picked up during combine.
> + }
I think this slightly is too simple (as you also showed) due to COST_N_INSNS
(0) == 4. We need to make sure to match the full patterns and then set their
costs. There must be three distinct costing paths: vadd.vv, vadd.vx and
vmv.vx.
>
> It takes 8 for original cost as well as replacement(see below combine log).
> Thus, it will be always
> keep replacement during combine.
>
> 51 │ trying to combine definition of r135 in:
> 52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI)
> 53 │ into:
> 54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
> 55 │ REG_DEAD r146:RVVM1DI
> 56 │ successfully matched this instruction to *add_vx_rvvm1di:
> 57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
> 58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
> 59 │ (reg:RVVM1DI 146)))
> 60 │ original cost = 4 + 32 (weighted: 262.469092), replacement cost = 32
> (weighted: 258.909092); keeping replacement
> 61 │ rescanning insn with uid = 18.
> 62 │ updating insn 18 in-place
> 63 │ verify found no changes in insn with uid = 18.
> 64 │ deleting insn 11
> 65 │ deleting insn with uid = 11.
>
> Based on above, I have another try to understand how late-combine leverage
> the RTX_COST.
> Aka, set vadd v1, (vec_dup(x1)) to 8 and others to 1.
>
> + if (rcode == PLUS) {
> + rtx arg0 = XEXP (x, 0);
> + rtx arg1 = XEXP (x, 1);
> +
> + if (riscv_v_ext_mode_p (GET_MODE (arg1))
> + && GET_CODE (arg0) == VEC_DUPLICATE) {
> + cost_val = 8;
> + }
> + }
>
> Then the late-combine reject the replacement as expected. Thus, the condition
> failed to combine may
> Looks like vmv.vx + vadd.vv < vadd.vx here if my understanding is correct.
> If so, it will also impact the
> --param we would like to introduce, a single --param=gr2vr_cost=XXX is not
> good enough to make sure that
> the condition is true, we may need --param=vv_cost/vx_cost=XXX.
Yes, at first we must deconstruct all relevant patterns as above for PLUS.
The basic cost for the add is COST_N_INSNS (1) == 4. If one operand is a
VEC_DUPLICATE then we increase the basic cost by GR2VR * COST_N_INSNS (1). Is
that not sufficient for the combination to not happen?
If we have
for (...)
{
vmv.vx
vadd.vv
}
then this should be combined even if COST (vmv.vx) + COST (vadd.vv) == COST
(vadd.vx) because we save an instruction and need to perform the broadcast
anyway.
For
vmv.vx
for (...)
{
vadd.vv
}
the combination should not take place (when the costs are equal) because of the
frequency consideration in late-combine's costing.
When COST (vmv.vx) + COST (vadd.vv) > COST (vadd.vx) it should take place.
We first need to get these basic building blocks correct before considering
something else.
> Btw, is there any approach to set the cost attached to the
> define_insn_and_split? Which may be more
> friendly to catch it from RTX_COST up to a point.
>
> 51 │ trying to combine definition of r135 in:
> 52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI)
> 53 │ into:
> 54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
> 55 │ REG_DEAD r146:RVVM1DI
> 56 │ successfully matched this instruction to *add_vx_rvvm1di:
> 57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
> 58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
> 59 │ (reg:RVVM1DI 146)))
> 60 │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 32
> (weighted: 258.909092); rejecting replacement
> 61 │
In the end we'll have to capture all patterns (predicated and unpredicated)
anyway so just tackling the unsplit ones only helps so much.
--
Regards
Robin