@jcf94 has explained very well for strassen algorithm. The link you posted is I wrote. However, we should notice that my post is not to show the best performance TVM could achieve, just show how easy TVM could a reasonable performance (beyond numpy).
If we still want to improve performance, we still could dig it. For example, adding `cache_write` for the matmul output stage / add `auto_unroll` configuration and so on. However, I think this is should be completed by our AutoTVM v2.0 (Auto Scheduler). You could try our auto scheduler. Simple matmul using topi should be upstreamed completely, right? cc @jcf94 --- [Visit Topic](https://discuss.tvm.apache.org/t/strassen-algorithm-for-dense/2661/9) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/1b89bdfb3f2ca6da12489b46e4be039c5a135c5dd45ad6ba9b8e08397a531f07).