Thanks @Lunderberg for the RFC. Logical-physical mapping is definitely an important feature. I also implemented something similar for warp memory to support tensor core instructions on GPU, I'm happy to collaborate more to get an unified design. Some preliminary comments: The current representation of logical-physical layout mapping is to use an array of axis/factor to define how the logical axes are split/reordered/fused to form the physical axes. This works for the case of packed layout like `NCHW4c`, but we might need to think whether this is a generic way to represent the mapping. For example, another way is to use a mapping function: `(n, c, h, w) -> (n, tir.floordiv(c, 4), h, w, tir.floormod(c, 4))`. This would allow arbitrary mapping (we can add more restrictions like requiring affine mapping though, to make analysis easier). A possible use cases of more complex mapping is [permuted layout](https://github.com/NVIDIA/cutlass/blob/master/media/docs/implicit_gemm_convolution.md#shared-memory-layouts) for shared memory on CUDA. Also, there are related [affine analysis infrastructure](https://github.com/apache/tvm/blob/main/include/tvm/arith/iter_affine_map.h) available, it would be great if we can reuse it for loop analysis and rewriting.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/39#issuecomment-935305808