Following a video chat discussion with @vinx13 , we touched on a number of points, summarized below. Also, we are adding @vinx13 as a co-author on this RFC.
- Are there cases where the flattening in `StorageFlatten`/`FlattenBuffer` should be inferred from buffer properties, rather than explicitly specified by the user? For example, if a buffer has `"texture"` scope, then we know it must be flattened to a 2-d buffer. We concluded that this wouldn't be possible, because the number of resulting dimensions isn't sufficient to define the flattening being applied. For example, if a 4-d buffer is being flattened to 2-d for use in texture memory, the four initial axes `[A, B, C, D]` could be flattened to `[A, fuse(B,C,D)]`, `[fuse(A,B), fuse(C,D)]`, or `[fuse(A,B,C), D]`, without any clear method that is better or worse. - How will buffer layout transformations be represented in TensorIR schedules? `buffer_transform` will be a primitive transformation in TensorIR, which is eagerly applied on the TensorIR computation. - In all cases, this would rewrite the buffer shape, and would rewrite loads/stores of that buffer. - If these loads/stores occur within a series of nested loops that cover all values of the buffer, and have no additional computation (e.g. cache read/write) in the body of these loops, then the loops will be rewritten to be along the transformed axes. can write remainder of schedule in terms of the transformed axes. Otherwise, rewriting the loops would not be well-defined, and will not be done. - The recommendation for use will be to apply the layout transformations prior to any other scheduling passes that could impact the loop structure, so that rewriting of the loops is possible. - Should buffer flattening be implemented as a special case of layout transformation? Buffer flattening should remain a separate concept from the layout transforms. Where all other layout transformations can be performed eagerly, and should be before other scheduling passes, buffer flattening must be performed after other scheduling passes. If it were done eagerly, other passes wouldn't have sufficient information about the structure of the buffer. - Is deprecating Store/Load acceptable, instead using BufferStore/BufferLoad throughout all lowering steps? Yes, as this gives a single uniform way to access buffers, regardless of the lowering step. The one concern is that we should port all existing functionality. For example, the vload/vstore methods in Buffer, which currently return Load/Store respectively, should not be removed, and instead should be updated to return flattened BufferLoad/BufferStore. - RampNode should be treated as a compiler internal, and shouldn't be easily constructible by users as indices into buffers. The preferred method to represent vectorized access is to have a buffer access within a vectorized loop, then allow `tir.transform.VectorizeLoop` to insert the RampNode. This matches previous behavior, where RampNode could occur in flattened Store/Load, while BufferLoad/BufferStore avoided RampNodes to maintain easy analysis of accessed locations. - Passes that change buffer dimensionality (e.g. InjectDoubleBuffer) should either be moved before the StorageFlatten/FlattenBuffer, or should be rewritten to instead resize the buffer, rather than changing the dimensionaltiy. The former would require the pass to also update the axis separators to be used when flattening. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/39#issuecomment-961358086