On 5/30/23 16:01, 钟居哲 wrote:
I agree with Andrew.

And I don't think this patch is appropriate for following reasons:
1. This patch increases vector workload in machine since
      it convert scalar load + vmv.v.x into vmv.v.i + vsll.vi.
This is probably uarch dependent. I can probably construct cases where the first will be better and I can probably construct cases where the latter will be better. In fact the recommendation from our uarch team is to generally do this stuff on the vector side.



2. For multi-issue OoO machine, scalar instructions are very cheap
     when they are located in vector codegen. For example a sequence
     like this:
       scalar insn
       scalar insn
       vector insn
       scalar insn
vector insn
       ....
       In such situation, we can issue multiple instructions simultaneously,
      and the latency of scalar instructions will be hided so scalar instruction       is cheap. Wheras this patch increasing vector pipeline workload is not
       friendly to OoO machine what I mentioned above.
I probably need to be careful what I say here :-) I'll go with mixing vector/scalar code may incur certain penalties on some microarchitectures depending on the exact code sequences involved.


3.   I can image the only benefit of this patch is that we can reduce scalar register pressure       in some extreme circumstances. However, I don't this benefit is "real" since GCC should       well schedule the instruction sequence when we well tune the vector instructions scheduling       model and cost model to make such register live range very short when the scalar register
       pressure is very high.

Overal, I disagree with this patch.
What I think this all argues is that it'll likely need to be uarch dependent. I'm not yet sure how to describe the properties of the uarch in a concise manner to put into our costing structure yet though.

jeff

Reply via email to