Thanks for your input and suggestions @tqchen, much appreciated! I added a 
paragraph about pattern matching TIR, see if it makes sense.

Yes, this RFC propses A1 change. A2 style TIR intrinsic is in the plan further 
down the line, it would let us expose SVE capabilities to the core compiler, so 
we could explore a larger space of optimisations. The decision to enable it 
initially just in the TIR->LLVM boundary came from a realisation that we can 
generate perfectly valid SVE from just looking at the TIR, without having to 
modify it.

I have spent some time playing around with the current LLVM codegen and I think 
you make a very good point with  the robustness. I have been looking at simple 
vectorized loads and stores (simple meaning here that the stride is 1 and that 
the index expression is a Ramp node, not a complex non-linear calculation with 
Ramp as a leaf node), the main challenge I currently see is that while the 
index itself is 1D at the point of code generation, the loop nest necessarily 
isn't, so I have to figure out the right loop bound that needs changing from 
the base of the Ramp node. It seems to me that we have to do some sort of 
analysis pass just before the codegen to collect that info. It would have been 
nice to directly generate the SVE LLVM "as we go" during the LLVM codegen, but 
it seems that we generate LLVM with the loop bounds fixed before we visit the 
loop body (so before we discover the Ramp nodes) and we can't change the bound 
afterwards. I think doing an analysis pass would help with the robustness since 
we can gather as much information from the TIR graph as we need to. 

I haven't worked a lot with LLVM backends, so interested in hearing any 
thoughts/suggestions. 

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/94#issuecomment-1275917969
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/94/c1275917...@github.com>

Reply via email to