> Ah, I see, thanks. So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4
> while vadd.vx costs 1 + 2, making it cheaper?
Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff
cost here to see
the impact to late-combine.
+ if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) {
+ cost_val = get_vector_costs ()->regmove->GR2VR;
+ }
---- cut line ----
If GR2VR is 2, we will perform the combine as below.
51 trying to combine definition of r135 in:
52 11: r135:RVVM1DI=vec_duplicate(r150:DI)
53 into:
54 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
55 REG_DEAD r146:RVVM1DI
56 successfully matched this instruction to *add_vx_rvvm1di:
57 (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
58 (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
59 (reg:RVVM1DI 146)))
60 original cost = 8 + 4 (weighted: 39.483637), replacement cost = 4
(weighted: 32.363637); keeping replacement
61 rescanning insn with uid = 18.
62 updating insn 18 in-place
63 verify found no changes in insn with uid = 18.
64 deleting insn 11
65 deleting insn with uid = 11.
---- cut line ----
If GR2VR is 1, we will perform the combine as below.
51 │ trying to combine definition of r135 in:
52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI)
53 │ into:
54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
55 │ REG_DEAD r146:RVVM1DI
56 │ successfully matched this instruction to *add_vx_rvvm1di:
57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
59 │ (reg:RVVM1DI 146)))
60 │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 4
(weighted: 32.363637); keeping replacement
61 │ rescanning insn with uid = 18.
62 │ updating insn 18 in-place
63 │ verify found no changes in insn with uid = 18.
64 │ deleting insn 11
65 │ deleting insn with uid = 11.
---- cut line ----
If GR2VR is 0, it will be normalized to 1 as below, thus the combine log looks
like the same as above.
return cost > 0 ? cost : COSTS_N_INSNS (1); gcc/rtlanal.cc:5766
it looks like we need to reconcile the vadd.vv cost either here ? Or am I
missing something here.
> With such a change the tests wouldn't pass by default (AFAICT) and we would
> need a --param=xx. I wouldn't worry about exposing those details to the user
> for now as we're so early in the cycle and can easily iterate on it. I would
> suggest just adding something in order to make the tests work as expected and
> change things later (if needed).
I see, will append the --param patch into the series.
Pan
-----Original Message-----
From: Robin Dapp <[email protected]>
Sent: Wednesday, April 23, 2025 3:01 PM
To: Li, Pan2 <[email protected]>; Robin Dapp <[email protected]>;
[email protected]
Cc: [email protected]; [email protected]; [email protected]; Chen,
Ken <[email protected]>; Liu, Hongtao <[email protected]>; Robin Dapp
<[email protected]>
Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx
on GR2VR cost
>> The only thing I think we want for the patch (as Pan also raised last time)
>> is
>> the param to set those .vx costs to zero in order to ensure the tests test
>> the
>> right thing (--param=vx_preferred/gr2vr_cost or something).
>
> I see, shall we start a new series for this? AFAIK, we may need some more
> alignment
> for something like --param=xx that exposing to end-user.
>
>> According to patchwork the tests you add pass but shouldn't they actually
>> fail
>> with a GR2VR cost of 2? I must be missing something.
>
> For now the cost of GR2VR is 2, take test vx_vadd-1-i64.c for example, the
> vec_dup + vadd.vv
> has higher cost than vadd.vx, thus perform the late-combine as below.
Ah, I see, thanks. So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4
while vadd.vx costs 1 + 2, making it cheaper?
IMHO vec_dup should just cost 2 (=GR2VR) rather than 3. All it does is
broadcast (no additional operation), while vadd.vx performs the broadcast (cost
2) as well as an operation (cost 1). So vec_dup + vadd.vv should cost 3, the
same as vadd.vx. In late combine when comparing costs we scale the them by
"frequency". The vadd.vx inside the loop should have higher frequency making
it more costly by default.
With such a change the tests wouldn't pass by default (AFAICT) and we would
need a --param=xx. I wouldn't worry about exposing those details to the user
for now as we're so early in the cycle and can easily iterate on it. I would
suggest just adding something in order to make the tests work as expected and
change things later (if needed).
--
Regards
Robin