https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96844
Bug ID: 96844 Summary: OpenMP: two worksharing constructs with different num_threads clauses break thread pooling Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: mority at posteo dot net Target Milestone: --- Created attachment 49154 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49154&action=edit Code that produces bug Hi, if a for loop contains two OpenMP worksharing constructs which specify different values in their num_threads clauses, thread pooling seems not to be working correctly. E.g., the first worksharing construct has num_threads(2) and the second num_threads(4). The expected behavior would be that a total of 4 threads is created. The first worksharing construct uses 2 of these threads and the second all of them. However, this seems not be the case. While thread pooling seems to work for the first worksharing construct, it fails for the second. Every time the second worksharing construct is executed, 2 new threads are created. This causes significant overhead. For clarification: There is no nested parallelism. The attached code can be used to reproduce the bug. The code can be compiled into 4 different versions using conditional compilation: 1. no OpenMP gcc -O3 -I. -Wall -g -DPRINT_TID mwe2_woMPI.c -o mwe2_woMPI 2. worksharing construct foo only gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_FOO -fopenmp mwe2_woMPI.c -o mwe2_woMPI_foo 3. worksharing construct bar only gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_BAR -fopenmp mwe2_woMPI.c -o mwe2_woMPI_bar 4. both worksharing constructs gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_FOO -DPRAGMA_BAR -fopenmp mwe2_woMPI.c -o mwe2_woMPI_foobar I analyzed the output of the different versions which contains the thread id for every iteration. Each worksharing construct in isolation works correctly and 2 or 4 threads are created, respectively. However, if both worksharing constructs are used at the same time, the first worksharing construct uses 2 different threads and the second 22 different threads. GCC versions 8.3, 9.2. and 10.2 all show this behavior. I also compiled the code with clang 10.1 and icc 19.4 which both handle the case correctly.