> IMHO this is how it should roughly look like: > With GR2VR=2: > vadd.vv: cost 4 = COST_N_INSNS (1) > vmv.v.x: cost COST_N_INSNS (GR2VR) = 8 > vadd.vx: cost 4 + GR2VR * COST_N_INSNS (1) = 12
> With GR2VR=1: > vadd.vv: cost 4 > vmv.v.x: cost 4 > vadd.vx: cost 4 + 4 = 8 > With GR2VR=0: > vadd.vv: cost 4 > vmv.v.x: cost 4 (or less?) > vadd.vx: cost 4 + 0 * COST_N_INSNS (1) = 4 > So with GR2VR > 0 we would perform the replacement when the frequency is > similar. With GR2VR == 0 we should always do. > vmv.v.x cost 4 with GR2VR cost == 0 is a bit debatable but setting it to 0 > would also seem off. Make sense to me, it looks like the combine will always take place if GR2VR is 0, 1 or 2 for now. I am try to customize the cost here to make it fail to combine but get failed with below change. + if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) { + cost_val = 1; + } + + if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0))) + && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 1)))) { + cost_val = 8; + } + + if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0))) + && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 1)))) { + cost_val = 2; // never picked up during combine. + } It takes 8 for original cost as well as replacement(see below combine log). Thus, it will be always keep replacement during combine. 51 │ trying to combine definition of r135 in: 52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI) 53 │ into: 54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI 55 │ REG_DEAD r146:RVVM1DI 56 │ successfully matched this instruction to *add_vx_rvvm1di: 57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ]) 58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ])) 59 │ (reg:RVVM1DI 146))) 60 │ original cost = 4 + 32 (weighted: 262.469092), replacement cost = 32 (weighted: 258.909092); keeping replacement 61 │ rescanning insn with uid = 18. 62 │ updating insn 18 in-place 63 │ verify found no changes in insn with uid = 18. 64 │ deleting insn 11 65 │ deleting insn with uid = 11. Based on above, I have another try to understand how late-combine leverage the RTX_COST. Aka, set vadd v1, (vec_dup(x1)) to 8 and others to 1. + if (rcode == PLUS) { + rtx arg0 = XEXP (x, 0); + rtx arg1 = XEXP (x, 1); + + if (riscv_v_ext_mode_p (GET_MODE (arg1)) + && GET_CODE (arg0) == VEC_DUPLICATE) { + cost_val = 8; + } + } Then the late-combine reject the replacement as expected. Thus, the condition failed to combine may Looks like vmv.vx + vadd.vv < vadd.vx here if my understanding is correct. If so, it will also impact the --param we would like to introduce, a single --param=gr2vr_cost=XXX is not good enough to make sure that the condition is true, we may need --param=vv_cost/vx_cost=XXX. Btw, is there any approach to set the cost attached to the define_insn_and_split? Which may be more friendly to catch it from RTX_COST up to a point. 51 │ trying to combine definition of r135 in: 52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI) 53 │ into: 54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI 55 │ REG_DEAD r146:RVVM1DI 56 │ successfully matched this instruction to *add_vx_rvvm1di: 57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ]) 58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ])) 59 │ (reg:RVVM1DI 146))) 60 │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 32 (weighted: 258.909092); rejecting replacement 61 │ Pan -----Original Message----- From: Robin Dapp <rdapp....@gmail.com> Sent: Thursday, April 24, 2025 8:13 PM To: Li, Pan2 <pan2...@intel.com>; Robin Dapp <rdapp....@gmail.com>; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, Ken <ken.c...@intel.com>; Liu, Hongtao <hongtao....@intel.com>; Robin Dapp <rdapp....@gmail.com> Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost >> Ah, I see, thanks. So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4 >> while vadd.vx costs 1 + 2, making it cheaper? > > Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff > cost here to see > the impact to late-combine. > > + if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) { > + cost_val = get_vector_costs ()->regmove->GR2VR; > + } > > ---- cut line ---- > > If GR2VR is 2, we will perform the combine as below. > > 51 trying to combine definition of r135 in: > 52 11: r135:RVVM1DI=vec_duplicate(r150:DI) > 53 into: > 54 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI > 55 REG_DEAD r146:RVVM1DI > 56 successfully matched this instruction to *add_vx_rvvm1di: > 57 (set (reg:RVVM1DI 147 [ vect__6.8_16 ]) > 58 (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ])) > 59 (reg:RVVM1DI 146))) > 60 original cost = 8 + 4 (weighted: 39.483637), replacement cost = 4 > (weighted: 32.363637); keeping replacement > 61 rescanning insn with uid = 18. > 62 updating insn 18 in-place > 63 verify found no changes in insn with uid = 18. > 64 deleting insn 11 > 65 deleting insn with uid = 11. > > ---- cut line ---- > > If GR2VR is 1, we will perform the combine as below. > > 51 │ trying to combine definition of r135 in: > 52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI) > 53 │ into: > 54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI > 55 │ REG_DEAD r146:RVVM1DI > 56 │ successfully matched this instruction to *add_vx_rvvm1di: > 57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ]) > 58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ])) > 59 │ (reg:RVVM1DI 146))) > 60 │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 4 > (weighted: 32.363637); keeping replacement > 61 │ rescanning insn with uid = 18. > 62 │ updating insn 18 in-place > 63 │ verify found no changes in insn with uid = 18. > 64 │ deleting insn 11 > 65 │ deleting insn with uid = 11. > > ---- cut line ---- > > If GR2VR is 0, it will be normalized to 1 as below, thus the combine log > looks like the same as above. IMHO this is how it should roughly look like: With GR2VR=2: vadd.vv: cost 4 = COST_N_INSNS (1) vmv.v.x: cost COST_N_INSNS (GR2VR) = 8 vadd.vx: cost 4 + GR2VR * COST_N_INSNS (1) = 12 With GR2VR=1: vadd.vv: cost 4 vmv.v.x: cost 4 vadd.vx: cost 4 + 4 = 8 With GR2VR=0: vadd.vv: cost 4 vmv.v.x: cost 4 (or less?) vadd.vx: cost 4 + 0 * COST_N_INSNS (1) = 4 So with GR2VR > 0 we would perform the replacement when the frequency is similar. With GR2VR == 0 we should always do. vmv.v.x cost 4 with GR2VR cost == 0 is a bit debatable but setting it to 0 would also seem off. -- Regards Robin