Re: [PATCH] RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]

Jeff Law Tue, 06 May 2025 06:06:35 -0700



On 4/16/25 8:32 AM, Paul-Antoine Arras wrote:

Please find attached an updated patch with an additional cost model. Bydefault, an instruction is 4 and the penalty for moving data fromfloating-point to vector register is 2; thus, vfmadd.vf costs 6, whichstill makes it cheaper than vec_duplicate + vfmadd.vv. Different tuningparameters can alter this tradeoff though.

Thanks.

We recently received our own BPI board, so I was able to run503.bwaves_r on it. Unfortunately, the DIC reduction does not translateinto similar execution time gains. The vector-scalar is only faster by0.33% on average over 3 iterations.

That's disappointing, but not a huge surprise. Vector FP on the K1/M1chip in those units is hard to do profitably -- your gains could well bemasked by the overall poor performance profile of those units.

diff --git gcc/config/riscv/riscv.cc gcc/config/riscv/riscv.cc
index 38f3ae7cd84..0f0cf04bdd9 100644
--- gcc/config/riscv/riscv.cc
+++ gcc/config/riscv/riscv.cc
@@ -3864,6 +3864,18 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   if (riscv_v_ext_mode_p (mode))
     {
       *total = COSTS_N_INSNS (1);
+      if ((GET_CODE (x) == PLUS || GET_CODE (x) == MINUS) && outer_code == SET)
+       {
+         rtx plus_op0 = XEXP (x, 0);
+         if (GET_CODE (plus_op0) == MULT)
+           {
+             rtx mult_op0 = XEXP (plus_op0, 0);
+             if (GET_CODE (mult_op0) == VEC_DUPLICATE)
+               {
+                 *total += get_vector_costs ()->regmove->FR2VR;
+               }
+           }
+       }
       return true;
     }

So this probably needs minor updates now that Pan's code is in, though Isuspect combining your work and his in the costing code will be trivial.


Functionally, I would suggest one change:

if (FLOAT_MODE_P (mode))
  *total += get_vector_costs ()->regmove->FR2VR;
else
  *total += get_vector_costs ()->regmove->GR2VR;

That way costing ought to work for the vector integer multiply-add/suboperations as well.

You'll need to double check if FLOAT_MODE_P works on a vector mode, ifnot, you may need to get the inner mode.



Otherwise it looks pretty good to me.

Robin, any recommendations from your side?

jeff

Re: [PATCH] RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]

Reply via email to