Thanks @ekalda . It is great to see us having conversations on bringing in SVE. 
The main question we want to resolve likely is going to be **what is the TIR 
spec goes into codegen that contains SVE info**.

Three alternatives have been discussed so far:

### A0: Loop with annotation but body as scalar

```python
  for (i: int32, 0, 20;i, annotation={"VLA"}) {
    C_2[i] = A_2[i] + B_2[i];
  }
```
### A1: Vectorized loop with constant vector factor 

```python
  for (i: int32, 0, 20; i) {
    C_2[ramp(i, 0, 5)] = A_2[ramp(i, 0, 5)] + B_2[ramp(i, 0, 5)];
  }
```

### A2: Vectorized loop with some form of TIR repr for sve vector

```python
  for (i: int32, 0, 20; i) {
    C_2[ramp(i, 0, vscale)] = A_2[ramp(i, 0, vscale)] + B_2[ramp(i, 0, vscale)];
  }
```

This would involve updates to the ramp note TIR. See 
```kScalableVectorLaneMark``` comment in [previous 
discussion](https://github.com/apache/tvm-rfcs/pull/18)

## Discussion
The above three perspective are to setup the stage for discussion. This RFC 
proposes A1. 

Because it is a proposed change to codegen only, which does not change TIR. If 
A1 can be implemented correctly, then it think it is a positive step(close to 
S0 type change we had in other conversations) even if we want to do things in 
several stages(with follow up S1 changes).

The main question of  discussion is how can we implement A1 robustly.  

Since turning a specialized code into general one is a bit like raising (from 
special case to general ones). It would be good to add high-level description 
about the pattern match and conversation rules.  For some background, initially 
I thought that there might be some traps when the code contains some 
specializations to lane, but thinking a bit more I find my initial thought of 
counter example actually is fine under A1. So I am more convinced of this 
approach. 


Something around the following:

We would only turn SVE specialization if the code satisfies the following 
pattern

- Pattern match all ramped load/store `A[ramp(iter*lanes, 0, lanes)]` to ensure 
they have same lanes, change lane to VL with predication
- Change the outer loop iter to vector loop.
- If there is a vector/load that does not satisfy the pattern, we abort.











-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/94#issuecomment-1264656688
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/94/c1264656...@github.com>

Reply via email to