Thanks @Lunderberg for the RFC. Logical-physical mapping is definitely an 
important feature. I also implemented something similar for warp memory to 
support tensor core instructions on GPU, I'm happy to collaborate more to get 
an unified design.
Some preliminary comments:
The current representation of logical-physical layout mapping is to use an 
array of axis/factor to define how the logical axes are split/reordered/fused 
to form the physical axes. This works for the case of packed layout like 
`NCHW4c`, but we might need to think whether this is a generic way to represent 
the mapping. For example, another way is to use a mapping function: `(n, c, h, 
w) -> (n, tir.floordiv(c, 4), h, w, tir.floormod(c, 4))`. This would allow 
arbitrary mapping (we can add more restrictions like requiring affine mapping 
though, to make analysis easier). A possible use cases of more complex mapping 
is [permuted 
layout](https://github.com/NVIDIA/cutlass/blob/master/media/docs/implicit_gemm_convolution.md#shared-memory-layouts)
 for shared memory on CUDA.
Also, there are related [affine analysis 
infrastructure](https://github.com/apache/tvm/blob/main/include/tvm/arith/iter_affine_map.h)
 available, it would be great if we can reuse it for loop analysis and 
rewriting.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/39#issuecomment-935305808

Reply via email to