Thanks @ekalda . It is great to see us having conversations on bringing in SVE.
The main question we want to resolve likely is going to be **what is the TIR
spec goes into codegen that contains SVE info**.
Three alternatives have been discussed so far:
### A0: Loop with annotation but body as scalar
```python
for (i: int32, 0, 20;i, annotation={"VLA"}) {
C_2[i] = A_2[i] + B_2[i];
}
```
### A1: Vectorized loop with constant vector factor
```python
for (i: int32, 0, 20; i) {
C_2[ramp(i, 0, 5)] = A_2[ramp(i, 0, 5)] + B_2[ramp(i, 0, 5)];
}
```
### A2: Vectorized loop with some form of TIR repr for sve vector
```python
for (i: int32, 0, 20; i) {
C_2[ramp(i, 0, vscale)] = A_2[ramp(i, 0, vscale)] + B_2[ramp(i, 0, vscale)];
}
```
This would involve updates to the ramp note TIR. See
```kScalableVectorLaneMark``` comment in [previous
discussion](https://github.com/apache/tvm-rfcs/pull/18)
## Discussion
The above three perspective are to setup the stage for discussion. This RFC
proposes A1.
Because it is a proposed change to codegen only, which does not change TIR. If
A1 can be implemented correctly, then it think it is a positive step(close to
S0 type change we had in other conversations) even if we want to do things in
several stages(with follow up S1 changes).
The main question of discussion is how can we implement A1 robustly.
Since turning a specialized code into general one is a bit like raising (from
special case to general ones). It would be good to add high-level description
about the pattern match and conversation rules. For some background, initially
I thought that there might be some traps when the code contains some
specializations to lane, but thinking a bit more I find my initial thought of
counter example actually is fine under A1. So I am more convinced of this
approach.
Something around the following:
We would only turn SVE specialization if the code satisfies the following
pattern
- Pattern match all ramped load/store `A[ramp(iter*lanes, 0, lanes)]` to ensure
they have same lanes, change lane to VL with predication
- Change the outer loop iter to vector loop.
- If there is a vector/load that does not satisfy the pattern, we abort.
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/94#issuecomment-1264656688
You are receiving this because you are subscribed to this thread.
Message ID: <apache/tvm-rfcs/pull/94/[email protected]>