钟居哲 <juzhe.zh...@rivai.ai> writes: >>> Both approaches are fine. I'm not against one or the other. > >>> What I didn't understand was why your patch only reuses existing IVs >>> for max_nscalars_per_iter == 1. Was it to avoid having to do a >>> multiplication (well, really a shift left) when moving from one >>> rgroup to another? E.g. if one rgroup had; > >>> nscalars_per_iter == 2 && factor == 1 > >>> and another had: > >>> nscalars_per_iter == 4 && factor == 1 > >>> then we would need to mulitply by 2 when going from the first rgroup >>> to the second. > >>> If so, avoiding a multiplication seems like a good reason for the choice >>> you were making in the path. But we then need to check >>> max_nscalars_per_iter == 1 for both the source rgroup and the >>> destination rgroup, not just the destination. And I think the >>> condition for “no multiplication needed” should be that: > > Oh, I didn't realize such complicated problem. Frankly, I didn't understand > well > rgroup. Sorry about that :). > > I just remember last time you said I need to handle multiple-rgroup > not only for SLP but also non-SLP (which is vec_pack_trunk that I tested). > Then I asked you when is non-SLP, you said max_nscalars_per_iter == 1.
Yeah, max_nscalars_per_iter == 1 is the right way of checking for non-SLP. But I'm never been convinced that SLP vs. non-SLP is a meaningful distinction for this patch (that is, the parts that don't use SELECT_VL). SLP vs. non-SLP matters for SELECT_VL. But the rgroup abstraction should mean that SLP vs. non-SLP doesn't matter otherwise. Thanks, Richard