On your case, current code is will call 4 cores (id 0 ~ 3). So parallel brings you better performance.
About time consuming functions, Do you use auto tvm? If you use auto tvm, the default cpu TVM uses is big core (that is index 7). If you decide to use 4 little cores, you should make auto tvm use these 4 little cores too. One elegant way is we should have `thread_mod` to make users set (see link: https://discuss.tvm.apache.org/t/autotvm-rpcrunner-and-tvm-num-threads/3534/11?u=frozengene). Current workaround could be done we disable core 4, 5, 6, 7 on devices temporally. (We indeed to provide one interface for users how to control big / little cores when to tune). --- [Visit Topic](https://discuss.tvm.apache.org/t/strassen-algorithm-for-dense/2661/14) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/53019625f9b79b15664205bdfa2d91ff63a162849ce2b6f87e1d79c69b5df1e0).