When we want to do some advanced optimization like `register blocking` the goal you want to achieve , TVM codegen can not handle it very well. My experience is 1. write micro gemm like `4x4` or `8x8` and then tensorize 2. try, try and try different schedule and find one combination to match your expectation, it is very painful. Maybe tensorir like @junrushao1994 mentioned could solve it better, but I don't think it could solve this low level fine-grained control problem completely.
--- [Visit Topic](https://discuss.tvm.apache.org/t/do-we-have-any-way-to-process-codegen-with-more-fine-grade-control/9908/4) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/bdbba21908c42ae7c5a944c8b2f9f37368e040e5358135b72933436a809d9108).