https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108558

            Bug ID: 108558
           Summary: OpenMP/Fortran 'has_device_addr' clause getting lost?
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Keywords: openmp
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tschwinge at gcc dot gnu.org
                CC: burnus at gcc dot gnu.org, jakub at gcc dot gnu.org
  Target Milestone: ---

It's certainly possible that I'm doing something wrong here (first use of
OpenMP 'has_device_addr' clause), but please consider:

    subroutine vectorAdd(a, b, M)
      implicit none
      integer(4)::a(M), b(M)
      integer:: i, M
      !$omp target teams distribute parallel do has_device_addr(a, b)
      do i = 1, M
        b(i) = a(i) + b(i)
      end do
    end subroutine

(..., which is called from a '!$omp target data use_device_addr(a, b)' inside a
'!$omp target data map(tofrom:a(1:M), b(1:M))'.)

If I 'diff' the '-fopenmp -fdump-tree-all' without vs. with the
'has_device_addr(a, b)' clause, I -- unexpectedly -- get no differences (aside
from minor ones due to what seems to be different order of compiler
temporaries):

'pr.f90.005t.original':

    #pragma omp target
    [...]

(Decomposed combined construct.  Is that perhaps where the problem lies?)

'pr.f90.006t.gimple':

    #pragma omp target num_teams(0) thread_limit(0) firstprivate(m)
map(tofrom:*b [len: D.4283][implicit]) map(alloc:b [pointer assign, bias: 0])
map(tofrom:*a [len: D.4280][implicit]) map(alloc:a [pointer assign, bias: 0])
    [...]

That is, 'map' instead of 'has_device_addr'.


In contrast, for the translated C code:

    void vectorAdd(int *a, int *b, int M)
    {
      #pragma omp target teams distribute parallel for has_device_addr(a, b)
      for (int i = 1; i < M; ++i)
        b[i] = a[i] + b[i];
    }

..., I see the expected 'diff' of 'pr.c.005t.original':

    -  #pragma omp target
    +  #pragma omp target has_device_addr(a) has_device_addr(b)

..., and 'diff' of 'pr.c.006t.gimple':

    -  #pragma omp target num_teams(0) thread_limit(0) firstprivate(M)
map(alloc:MEM[(char *)b] [len: 0]) map(firstprivate:b [pointer assign, bias:
0]) map(alloc:MEM[(char *)a] [len: 0]) map(firstprivate:a [pointer assign,
bias: 0])
    +  #pragma omp target num_teams(0) thread_limit(0) has_device_addr(a)
has_device_addr(b) firstprivate(M)

(Have not examined that one any further.)


Cross-checking with corresponding OpenACC/Fortran 'deviceptr' clause ('!$acc
parallel loop deviceptr(a, b)'), that seems to work as expected (from a quick
look, not futher examined):

'pr.f90.005t.original':

    -    #pragma acc parallel
    +    #pragma acc parallel map(force_deviceptr:*a) map(alloc:a [pointer
assign, bias: 0]) map(force_deviceptr:*b) map(alloc:b [pointer assign, bias:
0])

'pr.f90.006t.gimple':

    -    #pragma omp target oacc_parallel firstprivate(D.4291) map(tofrom:*b
[len: D.4298]) map(alloc:b [pointer assign, bias: 0]) map(tofrom:*a [len:
D.4295]) map(alloc:a [pointer assign, bias: 0])
    +    a.8_9 = a;
    +    b.9_10 = b;
    +    #pragma omp target oacc_parallel map(force_deviceptr:(*a.8_9) [len:
D.4298]) map(alloc:a [pointer assign, bias: 0]) map(force_deviceptr:(*b.9_10)
[len: D.4295]) map(alloc:b [pointer assign, bias: 0]) firstprivate(D.4291)


That's with GCC based on fairly recent commit
de99049f6fe5341024d4d939ac50d063280f90db (2023-01-11).

Reply via email to