Thank you very much for your reply.
As I said before, I refer to this tutorial to deploy tvm: https://tvm.apache.org/docs/deploy/cpp_deploy.html. I export tvm.build function as a library first, then load and call the function in C++. According to your suggestion, I set the cpu affinity in this way before calling tvm::runtime::Module::LoadFromFile: > tvm::runtime::threading::ThreadGroup::AffinityMode mode = > static_cast<tvm::runtime::threading::ThreadGroup::AffinityMode>(static_cast<int>(-1)); > tvm::runtime::ThreadPool::ThreadLocal()->UpdateWorkerConfiguration(mode, 4); The of each of my CPU is shown below: > index: 7 freqs: 3130000 > index: 4 freqs: 2544000 > index: 5 freqs: 2544000 > index: 6 freqs: 2544000 > index: 0 freqs: 2045000 > index: 1 freqs: 2045000 > index: 2 freqs: 2045000 > index: 3 freqs: 2045000 Then, I unset TVM_NUM_THREADS and tested many times.Compared with before(TVM_NUM_THREADS=1), the performance is indeed better. However, the time-consuming fluctuation is relatively large. For 256 * 256 * 256, the minimum time-consuming can reach 1745us, and the maximum time-consuming can reach 10971us. --- [Visit Topic](https://discuss.tvm.apache.org/t/strassen-algorithm-for-dense/2661/13) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/ea2da04d2ff538f2c5b7f10dde5dabcacb62ff5258867deb1ff7e8f317b02a99).