* Device: Skylake 8163 with 48 pysical cores

* Env Setting: TVM_BIND_THREADS=0 TVM_NUM_THREADS=4

* Code Snippet:
 ```
module = graph_executor.GraphModule(lib["default"](ctx))
def thread_run:
    for i in range(repeats):
        module.run()
threads = []
    for i in range(num_threads):
       threads.append(PropagatingThread(
            target=process_run,
       ))
```

* When num_threads=1
![A1F69032-BCEC-46B1-8118-493FD1CB2F4A|690x243](upload://dp2QT4TXdrbFRbv6yymzxVzedX5.jpeg)
 
There are 4 physical cores are occupied by TVM thread pool.
Each module.run() taskes 4ms.

* When num_threads=2
![7A20D611-CD6E-4979-9FEB-F4513845C1B7|690x245](upload://pF3As7eLVn24DwiT0Lq06j0W9zx.jpeg)
 
There are 8  physical cores are occupied by TVM thread pools.
Each module.run() taskes 8ms.

Since there are still 4 physical cores occupied by each thread in 2-threaded 
run, the performace is expected to be the same as single-threaded run. But the 
performace of each thread in 2-threaded run actually is only as 50% of 
single-threaded run,  and 4 -threaded run is only as 25% and so on...

Any idea about the performance degradation in multi-threaded run?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/multithread-threadpool-performance-degradation-when-running-relay-module-in-multiple-threads/10374/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/14ff3fa1c2117c9db5c1e949ea76a403d3cbe24a869b88713393a5fb44ad9a44).

Reply via email to