Ah, I see, thanks. So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4 while vadd.vx costs 1 + 2, making it cheaper?

Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff 
cost here to see
the impact to late-combine.

+  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) {
+    cost_val = get_vector_costs ()->regmove->GR2VR;
+  }

---- cut line ----

If GR2VR is 2, we will perform the combine as below.

 51 trying to combine definition of r135 in:
 52    11: r135:RVVM1DI=vec_duplicate(r150:DI)
 53 into:
 54    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
 55       REG_DEAD r146:RVVM1DI
 56 successfully matched this instruction to *add_vx_rvvm1di:
 57 (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
 58     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
 59         (reg:RVVM1DI 146)))
 60 original cost = 8 + 4 (weighted: 39.483637), replacement cost = 4 
(weighted: 32.363637); keeping replacement
 61 rescanning insn with uid = 18.
 62 updating insn 18 in-place
 63 verify found no changes in insn with uid = 18.
 64 deleting insn 11
 65 deleting insn with uid = 11.

---- cut line ----

If GR2VR is 1, we will perform the combine as below.

  51   │ trying to combine definition of r135 in:
  52   │    11: r135:RVVM1DI=vec_duplicate(r150:DI)
  53   │ into:
  54   │    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
  55   │       REG_DEAD r146:RVVM1DI
  56   │ successfully matched this instruction to *add_vx_rvvm1di:
  57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
  58   │     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
  59   │         (reg:RVVM1DI 146)))
  60   │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 4 
(weighted: 32.363637); keeping replacement
  61   │ rescanning insn with uid = 18.
  62   │ updating insn 18 in-place
  63   │ verify found no changes in insn with uid = 18.
  64   │ deleting insn 11
  65   │ deleting insn with uid = 11.

---- cut line ----

If GR2VR is 0, it will be normalized to 1 as below, thus the combine log looks 
like the same as above.

IMHO this is how it should roughly look like:

With GR2VR=2:
vadd.vv: cost 4 = COST_N_INSNS (1)
vmv.v.x: cost COST_N_INSNS (GR2VR) = 8
vadd.vx: cost 4 + GR2VR * COST_N_INSNS (1) = 12

With GR2VR=1:
vadd.vv: cost 4
vmv.v.x: cost 4
vadd.vx: cost 4 + 4 = 8

With GR2VR=0:
vadd.vv: cost 4
vmv.v.x: cost 4 (or less?)
vadd.vx: cost 4 + 0 * COST_N_INSNS (1) = 4

So with GR2VR > 0 we would perform the replacement when the frequency is similar. With GR2VR == 0 we should always do.

vmv.v.x cost 4 with GR2VR cost == 0 is a bit debatable but setting it to 0 would also seem off.

--
Regards
Robin

Reply via email to