* Device: Skylake 8163 with 48 pysical cores
* Env Setting: TVM_BIND_THREADS=0 TVM_NUM_THREADS=4
* Code Snippet:
```
module = graph_executor.GraphModule(lib["default"](ctx))
def thread_run:
for i in range(repeats):
module.run()
threads = []
for i in range(num_threads):
threads.append(PropagatingThread(
target=process_run,
))
```
* When num_threads=1

There are 4 physical cores are occupied by TVM thread pool.
Each module.run() taskes 4ms.
* When num_threads=2

There are 8 physical cores are occupied by TVM thread pools.
Each module.run() taskes 8ms.
Since there are still 4 physical cores occupied by each thread in 2-threaded
run, the performace is expected to be the same as single-threaded run. But the
performace of each thread in 2-threaded run actually is only as 50% of
single-threaded run, and 4 -threaded run is only as 25% and so on...
Any idea about the performance degradation in multi-threaded run?
---
[Visit
Topic](https://discuss.tvm.apache.org/t/multithread-threadpool-performance-degradation-when-running-relay-module-in-multiple-threads/10374/1)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/14ff3fa1c2117c9db5c1e949ea76a403d3cbe24a869b88713393a5fb44ad9a44).