That's another problem, AVX512 are mostly 1-d instructions, so often it does not care about the shape. (I hope my assertion is correct).
The offloaded intrin still requires the a shape of small tensor, which makes the intrin defined ad-hoc. Sometimes, like doing NCHWxc, it is an across dimension Op. Sometimes, it is a simple 1-D operation. It is hard to find one piece to fit all once shape is introduced. --- [Visit Topic](https://discuss.tvm.ai/t/about-the-tensorization-interface/3477/4) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/bb6248763c9669fb83670cf1d222407cc13bac9e6a3ff94b9b65f65ccb35d9e3). Tianqi Chen, UW, Seattle, WA, 98105, United States http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=PRpWfe2DN-I8xz0Rm1sJ2w2