Thanks for commenting @tqchen
Could you further clarify a few things for me please? See remarks inlined.
> Thanks @MeeraN7 . Yes I get what you mean. Right now we are adding a
> "is_scalable" field to indicate that the broadcast and ramp are "context
> dependent" on VL. Additionally, we might need to update DataType to indicate
> a scalable data type.
>
> This context dependency is the missing information I mentioned here.
I don't think I understand what you mean by context dependency. Basically, in
my view of the world, the Ramp node means that we can process elements in a
data parallel way. How exactly this is done, is up to backends depending on the
architecture and the code. What we are doing here is annotating this Ramp node
with a bit of state, which is a hint that we want a special kind of
vectorisation. From this point of view it is syntactic sugar: if the hint is
dropped or ignored, it could still be vectorised, but not in a vector length
agnostic way.
I don't think we need to encode much more information than one boolean that
specifies the vectorisation style. You could indeed argue that this is ad-hoc,
but if we are going to do it in a different way, we would still need to keep a
bit of state around, like the `annotation={"VLA"}` example that you gave
earlier. From that point of view, I don't see any difference.
> The set of code is really undefined and should be parameterized by VL.
What do you mean by undefined here?
> Additionally, considering the case of two loops with different VL1 and VL2
> and want to do some transformations, we might fall into the trap of thinking
> them as same type(because only "is_scalable" is marked) but in reality they
> are not, as a implicit dependency on VL can be ignored.
>
> I can understand that the additional flag can be helpful as we could reuse
> some of the vectorization logic. However, the "is_scalable" field might
> introduce additional confusion as above, and the additional ramp node may not
> carry too much additional information(apart from the fact that we use a
> scalar vs a vector type). So my main question is that whether or not we could
> use a separate normal form to hint the code generator without changing the
> current DataType, ramp and broadcast.
Correct me if I am wrong, but your assumption is that explicit scalable state
is bad. Looking at your example:
> **N1: A possible loop normal form via annotation**
>
> ```
> for (i: int32, 0, 17;i, annotation={"VLA"}) {
> C_2[i] = A_2[i] + B_2[i];
> }
> ```
This annotation looks equivalent to a loop pragma. In Clang you can for example
do:
```
#pragma loop vectorize_width(4, scalable)
for (I =0; I<17; ++i) {
C_2[i] = A_2[i] + B_2[I];
}
```
and thus request scalable vectorisation. If this annotation results in scalable
vectorisation, the LLVM vectoriser will lower this to operations using `<n x 4
x i32>` scalable vector IR types.
What I would like to say with these examples, is that there are 2 things going
on at different levels. I think:
- The `annotation={"VLA"}` corresponds to a loop pragma in Clang,
- The TIR Ramp node extension corresponds to LLVM IR scalable types, e.g. `<n x
4 x i32>`.
And I think these 2 concepts are different things that both have their value
and place.
I we can only annotate loops as being scalable, we loose the finer grained
control to request this on a statement level. I don't know if mixing fixed and
scalable will be an important use-case, but I think it is possible.
Summarising, I don't think explicit encoding of scalable in TIR nodes is a bad
thing, the opposite actually, I think we need it, and the annotation on the
loop might be a complementary technique to this.
What do you think?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/18#issuecomment-917576240