钟居哲 <juzhe.zh...@rivai.ai> writes:
>>> Both approaches are fine.  I'm not against one or the other.
>
>>> What I didn't understand was why your patch only reuses existing IVs
>>> for max_nscalars_per_iter == 1.  Was it to avoid having to do a
>>> multiplication (well, really a shift left) when moving from one
>>> rgroup to another?  E.g. if one rgroup had;
>
>>>   nscalars_per_iter == 2 && factor == 1
>
>>> and another had:
>
>>>   nscalars_per_iter == 4 && factor == 1
>
>>> then we would need to mulitply by 2 when going from the first rgroup
>>> to the second.
>
>>> If so, avoiding a multiplication seems like a good reason for the choice
>>> you were making in the path.  But we then need to check
>>> max_nscalars_per_iter == 1 for both the source rgroup and the
>>> destination rgroup, not just the destination.  And I think the
>>> condition for “no multiplication needed” should be that:
>
> Oh, I didn't realize such complicated problem. Frankly, I didn't understand 
> well
> rgroup. Sorry about that :).
>
> I just remember last time you said I need to handle multiple-rgroup
> not only for SLP but also non-SLP (which is vec_pack_trunk that I tested).
> Then I asked you when is non-SLP, you said max_nscalars_per_iter == 1.

Yeah, max_nscalars_per_iter == 1 is the right way of checking for non-SLP.

But I'm never been convinced that SLP vs. non-SLP is a meaningful
distinction for this patch (that is, the parts that don't use
SELECT_VL).

SLP vs. non-SLP matters for SELECT_VL.  But the rgroup abstraction
should mean that SLP vs. non-SLP doesn't matter otherwise.

Thanks,
Richard

Reply via email to