https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96844

            Bug ID: 96844
           Summary: OpenMP: two worksharing constructs with different
                    num_threads clauses break thread pooling
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mority at posteo dot net
  Target Milestone: ---

Created attachment 49154
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49154&action=edit
Code that produces bug

Hi,

if a for loop contains two OpenMP worksharing constructs which specify
different values in their num_threads clauses, thread pooling seems not to be
working correctly. 

E.g., the first worksharing construct has num_threads(2) and the second
num_threads(4). The expected behavior would be that a total of 4 threads is
created. The first worksharing construct uses 2 of these threads and the second
all of them. 

However, this seems not be the case. While thread pooling seems to work for the
first worksharing construct, it fails for the second. Every time the second
worksharing construct is executed, 2 new threads are created. This causes
significant overhead.

For clarification: There is no nested parallelism.

The attached code can be used to reproduce the bug. The code can be compiled
into 4 different versions using conditional compilation:

1. no OpenMP
    gcc -O3 -I. -Wall -g -DPRINT_TID mwe2_woMPI.c -o mwe2_woMPI

2. worksharing construct foo only
    gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_FOO -fopenmp mwe2_woMPI.c -o
mwe2_woMPI_foo

3. worksharing construct bar only
    gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_BAR -fopenmp mwe2_woMPI.c -o
mwe2_woMPI_bar

4. both worksharing constructs
    gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_FOO -DPRAGMA_BAR -fopenmp
mwe2_woMPI.c -o mwe2_woMPI_foobar

I analyzed the output of the different versions which contains the thread id
for every iteration. Each worksharing construct in isolation works correctly
and 2 or 4 threads are created, respectively. However, if both worksharing
constructs are used at the same time, the first worksharing construct uses 2
different threads and the second 22 different threads.

GCC versions 8.3, 9.2. and 10.2 all show this behavior. I also compiled the
code with clang 10.1 and icc 19.4 which both handle the case correctly.

Reply via email to