what do you mean exaclty by "imperfect loop tiling"?
On the first issue, tensorization lets us essentially inline high-performance code that implements a matrix-matrix or matrix-vector multiplication inner-loop body. This is very useful when targeting special hardware intrinsics, like performing AVX512 based GEMV, or invoking an accelerator's tensor core ISA, or performing neat tricks like bit-serial operations with vectorized popcount on ARM CPUs. --- [Visit Topic](https://discuss.tvm.ai/t/about-the-tensorization-interface/3477/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/e72721fc5cdf3a0a93067531bdc622ad3b4272af34a4569feaf1e3a5d3ddbe92). Tianqi Chen, UW, Seattle, WA, 98105, United States http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=dXOlL_3vZrXgn5wx_seCMw2