Ah, I see, thanks. So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4
while vadd.vx costs 1 + 2, making it cheaper?
Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff
cost here to see
the impact to late-combine.
+ if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) {
+ cost_val = get_vector_costs ()->regmove->GR2VR;
+ }
---- cut line ----
If GR2VR is 2, we will perform the combine as below.
51 trying to combine definition of r135 in:
52 11: r135:RVVM1DI=vec_duplicate(r150:DI)
53 into:
54 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
55 REG_DEAD r146:RVVM1DI
56 successfully matched this instruction to *add_vx_rvvm1di:
57 (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
58 (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
59 (reg:RVVM1DI 146)))
60 original cost = 8 + 4 (weighted: 39.483637), replacement cost = 4
(weighted: 32.363637); keeping replacement
61 rescanning insn with uid = 18.
62 updating insn 18 in-place
63 verify found no changes in insn with uid = 18.
64 deleting insn 11
65 deleting insn with uid = 11.
---- cut line ----
If GR2VR is 1, we will perform the combine as below.
51 │ trying to combine definition of r135 in:
52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI)
53 │ into:
54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
55 │ REG_DEAD r146:RVVM1DI
56 │ successfully matched this instruction to *add_vx_rvvm1di:
57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
59 │ (reg:RVVM1DI 146)))
60 │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 4
(weighted: 32.363637); keeping replacement
61 │ rescanning insn with uid = 18.
62 │ updating insn 18 in-place
63 │ verify found no changes in insn with uid = 18.
64 │ deleting insn 11
65 │ deleting insn with uid = 11.
---- cut line ----
If GR2VR is 0, it will be normalized to 1 as below, thus the combine log looks
like the same as above.
IMHO this is how it should roughly look like:
With GR2VR=2:
vadd.vv: cost 4 = COST_N_INSNS (1)
vmv.v.x: cost COST_N_INSNS (GR2VR) = 8
vadd.vx: cost 4 + GR2VR * COST_N_INSNS (1) = 12
With GR2VR=1:
vadd.vv: cost 4
vmv.v.x: cost 4
vadd.vx: cost 4 + 4 = 8
With GR2VR=0:
vadd.vv: cost 4
vmv.v.x: cost 4 (or less?)
vadd.vx: cost 4 + 0 * COST_N_INSNS (1) = 4
So with GR2VR > 0 we would perform the replacement when the frequency is
similar. With GR2VR == 0 we should always do.
vmv.v.x cost 4 with GR2VR cost == 0 is a bit debatable but setting it to 0
would also seem off.
--
Regards
Robin