Following a video chat discussion with @vinx13 , we touched on a number of 
points, summarized below.  Also, we are adding @vinx13 as a co-author on this 
RFC.

- Are there cases where the flattening in `StorageFlatten`/`FlattenBuffer` 
should be inferred from buffer properties, rather than explicitly specified by 
the user?  For example, if a buffer has `"texture"` scope, then we know it must 
be flattened to a 2-d buffer.  We concluded that this wouldn't be possible, 
because the number of resulting dimensions isn't sufficient to define the 
flattening being applied.  For example, if a 4-d buffer is being flattened to 
2-d for use in texture memory, the four initial axes `[A, B, C, D]` could be 
flattened to `[A, fuse(B,C,D)]`, `[fuse(A,B), fuse(C,D)]`, or `[fuse(A,B,C), 
D]`, without any clear method that is better or worse.

- How will buffer layout transformations be represented in TensorIR schedules?  
 `buffer_transform` will be a primitive transformation in TensorIR, which is 
eagerly applied on the TensorIR computation.
  - In all cases, this would rewrite the buffer shape, and would rewrite 
loads/stores of that buffer.
  - If these loads/stores occur within a series of nested loops that cover all 
values of the buffer, and have no additional computation (e.g. cache 
read/write) in the body of these loops, then the loops will be rewritten to be 
along the transformed axes. can write remainder of schedule in terms of the 
transformed axes.  Otherwise, rewriting the loops would not be well-defined, 
and will not be done.
  -  The recommendation for use will be to apply the layout transformations 
prior to any other scheduling passes that could impact the loop structure, so 
that rewriting of the loops is possible.

- Should buffer flattening be implemented as a special case of layout 
transformation?  Buffer flattening should remain a separate concept from the 
layout transforms.  Where all other layout transformations can be performed 
eagerly, and should be before other scheduling passes, buffer flattening must 
be performed after other scheduling passes.  If it were done eagerly, other 
passes wouldn't have sufficient information about the structure of the buffer.

- Is deprecating Store/Load acceptable, instead using BufferStore/BufferLoad 
throughout all lowering steps?  Yes, as this gives a single uniform way to 
access buffers, regardless of the lowering step.  The one concern is that we 
should port all existing functionality.  For example, the vload/vstore methods 
in Buffer, which currently return Load/Store respectively, should not be 
removed, and instead should be updated to return flattened 
BufferLoad/BufferStore.

- RampNode should be treated as a compiler internal, and shouldn't be easily 
constructible by users as indices into buffers.  The preferred method to 
represent vectorized access is to have a buffer access within a vectorized 
loop, then allow `tir.transform.VectorizeLoop` to insert the RampNode.  This 
matches previous behavior, where RampNode could occur in flattened Store/Load, 
while BufferLoad/BufferStore avoided RampNodes to maintain easy analysis of 
accessed locations.

- Passes that change buffer dimensionality (e.g. InjectDoubleBuffer) should 
either be moved before the StorageFlatten/FlattenBuffer, or should be rewritten 
to instead resize the buffer, rather than changing the dimensionaltiy.  The 
former would require the pass to also update the axis separators to be used 
when flattening.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/39#issuecomment-961358086

Reply via email to