On 4/16/25 8:32 AM, Paul-Antoine Arras wrote:


Please find attached an updated patch with an additional cost model. By default, an instruction is 4 and the penalty for moving data from floating-point to vector register is 2; thus, vfmadd.vf costs 6, which still makes it cheaper than vec_duplicate + vfmadd.vv. Different tuning parameters can alter this tradeoff though.
Thanks.



We recently received our own BPI board, so I was able to run 503.bwaves_r on it. Unfortunately, the DIC reduction does not translate into similar execution time gains. The vector-scalar is only faster by 0.33% on average over 3 iterations.
That's disappointing, but not a huge surprise. Vector FP on the K1/M1 chip in those units is hard to do profitably -- your gains could well be masked by the overall poor performance profile of those units.

diff --git gcc/config/riscv/riscv.cc gcc/config/riscv/riscv.cc
index 38f3ae7cd84..0f0cf04bdd9 100644
--- gcc/config/riscv/riscv.cc
+++ gcc/config/riscv/riscv.cc
@@ -3864,6 +3864,18 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   if (riscv_v_ext_mode_p (mode))
     {
       *total = COSTS_N_INSNS (1);
+      if ((GET_CODE (x) == PLUS || GET_CODE (x) == MINUS) && outer_code == SET)
+       {
+         rtx plus_op0 = XEXP (x, 0);
+         if (GET_CODE (plus_op0) == MULT)
+           {
+             rtx mult_op0 = XEXP (plus_op0, 0);
+             if (GET_CODE (mult_op0) == VEC_DUPLICATE)
+               {
+                 *total += get_vector_costs ()->regmove->FR2VR;
+               }
+           }
+       }
       return true;
     }
So this probably needs minor updates now that Pan's code is in, though I suspect combining your work and his in the costing code will be trivial.

Functionally, I would suggest one change:

if (FLOAT_MODE_P (mode))
  *total += get_vector_costs ()->regmove->FR2VR;
else
  *total += get_vector_costs ()->regmove->GR2VR;

That way costing ought to work for the vector integer multiply-add/sub operations as well.

You'll need to double check if FLOAT_MODE_P works on a vector mode, if not, you may need to get the inner mode.


Otherwise it looks pretty good to me.

Robin, any recommendations from your side?

jeff

Reply via email to