https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118694

--- Comment #6 from Tobias Burnus <burnus at gcc dot gnu.org> ---
> Are we required to diagnose this as an error
> or is it allowable to permit this as an extension?

Answer "no" and "yes" - but the problem is that in general it does not work.
(Potential wrong code issues, albeit as below only bad and not very bad.)

The problem is that for

  #pragma omp target
    #pragma teams

on common offload hardware (like the AMD and Nvidia GPUs), the code is
not executed as:
- Start the offload kernel
- then, on the device start multiple teams

But:
- Start the offload kernel with several teams

Thus, it is impossible to add code between 'target' and 'teams'. For user code,
this is really very easy to generate wrong code in this case! And there is also
no directive that syncs multiple teams (contention groups) - except that once
the teams construct has finished, all teams have finished. Otherwise, only
thread synchronization in a single team (contention group works → cgroup ~
pteam).

* * *

In particular, if I execute your code and set the GOMP_DEBUG=1 env var, I see:

  GOMP_OFFLOAD_run: kernel f$_omp_fn$0: launch [(teams: 1), 1, 1] [(lanes: 32),
(threads: 8), 1]
  GOMP_OFFLOAD_run: kernel f$_omp_fn$0: launch [(teams: 1), 1, 1] [(lanes: 32),
(threads: 8), 1]


If I replace the condition by
        when (user={condition(1)}: teams loop) \
the result is (twice):

  GOMP_OFFLOAD_run: kernel f$_omp_fn$0: launch [(teams: 60), 1, 1] [(lanes:
32), (threads: 8), 1]


Thus, the code runs with 60 teams while your code is run only with a single
team. This implies that the the runtime could be 60 faster, cutting it down
from 1 hour to 1 minute!

(It could be even a bit faster if memory access + caching improve, but
typically it is quite a bit less than 60×, but it should still be very
visible!)

* * *

BTW: I wonder why it doesn't work for

  constexpr int flag2 = 1;
...
        when (user={condition(flag2)}: teams loop) \

shouldn't this be compile-time optimized (assuming C23 or a semi-new C++)?
I still see:

teams2.c:10:11: warning: ‘target’ construct with nested ‘teams’ construct
contains directives outside of the ‘teams’ construct [-Wopenmp]

and for the launch accordingly: "launch [(teams: 1),".


And, unsurprisingly, I also get the same for:

        when (device={kind(nohost)}: teams loop) \

which is the case that we want to handle here.

Reply via email to