to clarify a bit, we do need have to ask for doing everything as form of
schedule, so it is OK for example to generate a compute definition that already
contains packing (you can view that as one special dispatch pass).
The main ask is that the TIR schedule pass should detect the already packed
Thanks for taking a look @tqchen! Since scheduling will be completed with
TensorIR, it will provide the building blocks for being plugged into an
IRModule=>IRModule transformation pass. For our current use-case, it's
important to be able to fallback to previous optimizations in the form of TE
s