Re: [PATCH v2 3/3] vect: Use strided loads for VMAT_STRIDED_SLP.

Robin Dapp Mon, 28 Jul 2025 04:10:31 -0700

Yeah, VMAT_STRIDED_SLP is what VMAT_ELEMENTWISE was to non-SLP,
though how we emit the contiguous part of the SLP group depends and it could
be elementwise as fallback.


For the single-element case (and only for that one AFAICT) we can switch to
VMAT_GATHER_SCATTER.  Is the idea to relax that and also allow "strided"
gather/scatter for larger groups, involving composition types in particular?
Or maybe I missed the point.


So the idea would be to, for the loop example above where IIRC
pix1/pix2 are char,
either emit a gather to VnSI combining four consecutive QImode loads into one
SImode and then view-converting the result back to a VmQImode vector.  That
also simplifies the offset vector calculation.  The "fallback"
(consider non-power-of-two
group size) would of course be to gather the VmQImode vector directly,
and have an offset vector of { 0, 1, 2, 3, stride, stride+1, stride+2,
stride+3, ... }

Ok good then we're aligned because that's mostly what I already did (thoughobviously inside VMAT_STRIDED_SLP and just for strided load, not moregenerically as we want here).

I think it might be possible that refactoring how we do VMAT_STRIDED_SLP
vs VMAT_GATHER/SCATTER, at least and possibly specifically for the
case of emulated handling would be a good thing.  But it'll require experiments
and see how it all fits together.

Yes, I'll try to play around and try some re-ordering and fallbacks. My mainconcern is choosing a more expensive load (gather) and not being able to fallback to a lighter-weight vector-vector composition scheme. But maybe we canprobe in advance if such a scheme is available and how vector size trade-offsetc. are.

My current priority is to sort out the analysis-vs-transform "split" and storing
more data from analysis into the SLP node for both load/store and reductions
so that data is also more easily accessible from the cost models.

One thing I remember I wanted to fix is adjusting the vector size for costingafter the view-converting a vector (that we're already doing inVMAT_STRIDED_SLP). For our microarchitecture the number of elements is crucialin costing a gather/scatter/strided.


--
Regards
Robin

Re: [PATCH v2 3/3] vect: Use strided loads for VMAT_STRIDED_SLP.

Reply via email to