https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88940

            Bug ID: 88940
           Summary: [openacc, libgomp] cuModuleLoadData error for
                    asynchronous parallel with abort
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider abort-6.c (the async version of abort-1.c):
...
/* { dg-do run } */

#include <stdio.h>
#include <stdlib.h>

int
main (void)
{
  fprintf (stderr, "CheCKpOInT\n");
#pragma acc parallel async
  {
    abort ();
  }

  #pragma acc wait
  return 0;
}

/* { dg-output "CheCKpOInT" } */
/* { dg-shouldfail ""  } */
...

It passes at O2 like this:
...
CheCKpOInT

libgomp: cuModuleLoadData error: an illegal instruction was encountered

libgomp: Cannot map target functions or variables (expected 1, have 4294967295)
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/abort-6.c \ 
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O2  execution test
...

Normally, the compiler can optimize away the code following an abort, because
the abort is a noreturn call.

In the case of an async parallel however, the parallel region is executed
asynchronously, that is, the expected behaviour is:
- the kernel representing the parallel region is launched
- the code launching the kernel does not wait for the launch to start or
finish,
- the code following the parallel region is executed

So, in the case of the test-case above, we can eliminate all code after the
abort inside the parallel async region, but we can't eliminated any code after
the parallel async region.

The compiler presently does precisely that: it removes the code after the
parallel async region. At eh, we have:
...
main ()
{
  int D.3104;

  stderr.0_1 = stderr;
  __builtin_fwrite ("CheCKpOInT\n", 1, 11, stderr.0_1);
  #pragma omp target oacc_parallel async(-1) [child fn: main._omp_fn.0 (???)]
  abort ();
  #pragma omp return
  __builtin_GOACC_wait (-2, 0);
  D.3104 = 0;
  goto <D.3109>;
  D.3104 = 0;
  goto <D.3109>;
  <D.3109>:
  return D.3104;
}
...

and at cfg, we have:
...
main ()
{
  int D.3104;

  <bb 2> :
  stderr.0_1 = stderr;
  __builtin_fwrite ("CheCKpOInT\n", 1, 11, stderr.0_1);
  #pragma omp target oacc_parallel async(-1) [child fn: main._omp_fn.0 (???)]   

  <bb 3> :
  abort ();

}
...

And at optimized:
...
main ()
{
  struct _IO_FILE * stderr.0_1;

  <bb 2> [local count: 1073741824]:
  stderr.0_1 = stderr;
  fwrite ("CheCKpOInT\n", 1, 11, stderr.0_1);
  GOACC_parallel_keyed (-1, main._omp_fn.0, 0, 0B, 0B, 0B, 536936447, -1, 0);
  __builtin_unreachable ();

}
...

The __builtin_unreachable seems to expand to nothing, so at runtime, when
executing main, we return from GOACC_parallel_keyed, step out of main, and
start executing the code after main, which happens to be _start, which
eventually leads to cuModuleLoadData picking up the result of the abort, which
leads to the "Cannot map" error message.

This problem is encountered for the nvptx target, but I think this is a generic
openacc problem.

Reply via email to