Hi @tqchen @kparzysz-quic @kparzysz-quic @masahi @tkonolige @smeijer1234 , We are looking to revive this work. I have gone through the thread. Summary so far is as follows :
* We want to introduce/enhance a scheduling vectorization primitive that could be controlled by user/auto-tuner/auto-scheduler either to use scalable vectors in the backend codegen. * The conversation has resolved to be extending the existing vectorize scheduling primitive i.e. : s[C].vectorize(..., scalable=True) * Usage of this scheduling primitive should result in creating a for loop with a Ramp nodes with either an additional argument "is_scalable" or special number for lanes. * I think @tqchen was suggesting to use the special lane number (-1) as opposed to introducing an additional argument to all TIR nodes such as Ramp and Broadcast as well as DataType (and to DLDataType) to avoid ABI breakages. * Moreover, VectorizeLoopScalable will be modified to create a While node. * The name of the RFC is confusing ? @kparzysz-quic . I suppose for TIR, what we are adding is vector-length agnostic vectorization support for TIR, while demonstrating the codegen of VLA vectorized TIR using Arm(R) SVE codegen. Please confirm whether this is a right summary of the current state. As for next steps, I would like to propose/resolve each of the outstanding discussion points and update the RFC. -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/18#issuecomment-1162041631 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm-rfcs/pull/18/c1162041...@github.com>