@FrozenGene Thanks for the quick feedback on the design. I understand the performance concern. Let's try to tackle them in fusion. Fusion already performs compute_inline to bring the computation at right location. Hopefully, with some tagging and with some arm-twisting, we can achieve same tensorize schedule that you are suggesting.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-508790037