On Thu, Jul 02, 2020 at 05:15:20PM +0100, Andrew Stubbs wrote:
> This patch, originally by Kwok, auto-adjusts the default OpenMP target
> arguments to set num_threads(1) when there are no parallel regions. There
> may still be multiple teams in this case.
> 
> The result is that libgomp will not attempt to launch GPU threads that will
> never get used.
> 
> OK to commit?

That doesn't look safe to me.
My understanding of the patch is that it looks for parallel construct
lexically in the target region, but that isn't sufficient, one can do that
only if the target region can't encounter a parallel construct in the target
region (i.e. the body and all functions that are called from it at runtime).

void
foo ()
{
  #pragma omp distribute parallel for simd
  for (int i = 0; i < 10000000; i++)
    do_something;
}

extern void baz (); // function that calls foo, unconditionally or conditionally
#pragma omp declare target to (foo, baz)

void
bar ()
{
  #pragma omp target teams
  baz ();
}

Perhaps one could ignore some builtin calls but it would need to be ones
where one can assume there will be no OpenMP code in them.

Also, it needs to avoid doing the optimization if there is or might
indirectly be called omp_get_thread_limit (), because if the optimization
forces thread_limit (1), that means that omp_get_thread_limit () in the
region will also return 1 rather than the expected value.

        Jakub

Reply via email to