Re: [PATCH v2 3/3] vect: Use strided loads for VMAT_STRIDED_SLP.

Robin Dapp Fri, 25 Jul 2025 13:32:28 -0700

That would definitely be nice to have for both gather and stride loads


I'm not sure I like the direction that's heading ;)

So the loop I'm targeting is x264's satd:

   for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 )
   {
        a0 = (pix1[0] - pix2[0])...
        a1 = (pix1[1] - pix2[1])...
        a2 = (pix1[2] - pix2[2])...
        a3 = (pix1[3] - pix2[3])...

where DR_STEP is known but non-constant, so STMT_VINFO_STRIDED_P = true.

Right now we always set VMAT_STRIDED_SLP when STMT_VINFO_STRIDED_P.

For the single-element case (and only for that one AFAICT) we can switch toVMAT_GATHER_SCATTER. Is the idea to relax that and also allow "strided"gather/scatter for larger groups, involving composition types in particular?Or maybe I missed the point.

One complication with that is that generic gather/scatter on riscv is alsopretty slow, not sure if it's as bad as on x86 but certainly only rarely a win.

At least right now I'm having a hard time imagining which strategy will befaster and I'd be more comfortable with a costing decision rather than a staticswitch. And we don't compare costs for different strategies but just chooseone for a specific mode. Of course, in the end VMAT_STRIDED_SLP usuallyperforms scalar loads in order to construct a vector but vector-vector loadsand construction is also possible. Maybe that's better thangather/scatter/strided. I would need to compare a few cases for real to get

a better feeling of it.

If I didn't miss the point I could give it a shot. Maybe my complicateddynamic-dispatch scheme for groups larger than the largest vector unit wouldfit in there as well then (PR120639).


--
Regards
Robin

Re: [PATCH v2 3/3] vect: Use strided loads for VMAT_STRIDED_SLP.

Reply via email to