What I mean is that runtime need to be aware of the memory layout and provide out[slice] = f(inputs). Another possible "obstacle" is that TVM's compute kernel requires the buffer to be somewhat aligned, and we need to generate a special kernel for ```out[slice] = f(inputs)```, with a known offset(so we still benefit from good alignment). This is necessary for OpenCL
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2975#issuecomment-480441459