I see, let the vec_dup enter the rtx_cost again to append the total to vmv, I
have a try testing. For example with below change:
+ switch (rcode)
+ {
+ case VEC_DUPLICATE:
+ *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);
+ break;
+ case PLUS:
+ {
+ rtx op_0 = XEXP (x, 0);
+ rtx op_1 = XEXP (x, 1);
+ if (GET_CODE (op_0) == VEC_DUPLICATE
+ || GET_CODE (op_1) == VEC_DUPLICATE)
+ *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);
+ else
+ *total = COSTS_N_INSNS (1);
+ break;
+ }
+ default:
+ *total = COSTS_N_INSNS (1);
+ break;
+ }
+
+ return true;
For case_0, GR2VR is 0, we will have late-combine as blow:
51 │ trying to combine definition of r135 in:
52 │ 11: r135:RVVM1SI=vec_duplicate(r150:DI#0)
53 │ into:
54 │ 18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI
55 │ REG_DEAD r146:RVVM1SI
56 │ successfully matched this instruction to *add_vx_rvvm1si:
57 │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ])
58 │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 [
x ]) 0))
59 │ (reg:RVVM1SI 146)))
60 │ original cost = 8 + 4 (weighted: 39.483637), replacement cost = 8
(weighted: 64.727273); rejecting replacement
For case_0, GR2VR is 1, we will have late-combine as blow:
51 │ trying to combine definition of r135 in:
52 │ 11: r135:RVVM1SI=vec_duplicate(r150:DI#0)
53 │ into:
54 │ 18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI
55 │ REG_DEAD r146:RVVM1SI
56 │ successfully matched this instruction to *add_vx_rvvm1si:
57 │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ])
58 │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 [
x ]) 0))
59 │ (reg:RVVM1SI 146)))
60 │ original cost = 12 + 4 (weighted: 43.043637), replacement cost = 12
(weighted: 97.090910); rejecting replacement
For case_0, GR2VR is 2, we will have late-combine as blow:
51 │ trying to combine definition of r135 in:
52 │ 11: r135:RVVM1SI=vec_duplicate(r150:DI#0)
53 │ into:
54 │ 18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI
55 │ REG_DEAD r146:RVVM1SI
56 │ successfully matched this instruction to *add_vx_rvvm1si:
57 │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ])
58 │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 [
x ]) 0))
59 │ (reg:RVVM1SI 146)))
60 │ original cost = 16 + 4 (weighted: 46.603637), replacement cost = 16
(weighted: 129.454546); rejecting replacement
The vadd v, vec_dup(x) seems has the same cost as vec_dup here. I am also
confused about the how we calculate the
vadd v, vec_dup(x), can we just set its' cost to vadd.vx? given we have
define_insn_and_split to match the pattern and
emit the vadd.vx directly. And it matches the expr we mentioned vadd.vv + vec
== vadd.vx.
Please help to correct me if misunderstanding.
------------- cut line -------------
> Could you give a short example where that doesn't work so I can have a quick
> look myself?
Sorry for misleading, it is not something doesn't work, just would like to make
it clean before next step as it
is quit complicate up to a point.
However, I use below example for testing, case_0 for only vadd.vv in loop,
while case_1 for vmv and vadd.vv in loop.
1 │ #include <stdint.h>
2 │
3 │ #define T int32_t
4 │ #define OP +
5 │
6 │ void
7 │ test_vx_binary_case_0 (T * restrict out, T * restrict in, T x,
unsigned n)
8 │ {
9 │ for (unsigned i = 0; i < n; i++)
10 │ out[i] = in[i] OP x;
11 │ }
12 │
13 │ void
14 │ test_vx_binary_case_1 (T * restrict out, T * restrict a, T b, unsigned
n)
15 │ {
16 │ unsigned k = 0;
17 │
18 │ T xb = b + 3;
19 │
20 │ while (k < n) {
21 │ out[k + 0] = a[k + 0] OP xb;
22 │ out[k + 1] = a[k + 1] OP xb;
23 │ out[k + 2] = a[k + 2] OP xb;
24 │ out[k + 3] = a[k + 3] OP xb;
25 │ k += 4;
26 │
27 │ xb = xb ^ 0x3f;
28 │
29 │ out[k + 0] = a[k + 0] OP xb;
30 │ out[k + 1] = a[k + 1] OP xb;
31 │ out[k + 2] = a[k + 2] OP xb;
32 │ out[k + 3] = a[k + 3] OP xb;
33 │ k += 4;
34 │ }
35 │ }
Pan
-----Original Message-----
From: Robin Dapp <[email protected]>
Sent: Tuesday, April 29, 2025 2:31 PM
To: Li, Pan2 <[email protected]>; Robin Dapp <[email protected]>;
[email protected]
Cc: [email protected]; [email protected]; [email protected]; Chen,
Ken <[email protected]>; Liu, Hongtao <[email protected]>; Robin Dapp
<[email protected]>
Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx
on GR2VR cost
> But this is not that good enough here if my understanding is correct.
> As vmv.v.x is somehow equivalent to vec_dup but doesn't ref GR2VR,
But it should. Can't we do something like:
if (riscv_v_ext_mode_p (mode))
{
switch (GET_CODE (x))
{
case VEC_DUPLICATE:
// will add 0, 4, 8 in case of GR2VR = 0, 1, 2
*total += GR2VR * COST_N_INSNS (1);
break;
case PLUS:
rtx op2 = XEXP (XEXP (x, 1), 0);
if (GET_CODE (op2) == VEC_DUPLICATE)
*total += GR2VR * COST_N_INSNS (1);
...
break;
Could you give a short example where that doesn't work so I can have a quick
look myself?
Thanks.
--
Regards
Robin