https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108558
Bug ID: 108558 Summary: OpenMP/Fortran 'has_device_addr' clause getting lost? Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: openmp Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: tschwinge at gcc dot gnu.org CC: burnus at gcc dot gnu.org, jakub at gcc dot gnu.org Target Milestone: --- It's certainly possible that I'm doing something wrong here (first use of OpenMP 'has_device_addr' clause), but please consider: subroutine vectorAdd(a, b, M) implicit none integer(4)::a(M), b(M) integer:: i, M !$omp target teams distribute parallel do has_device_addr(a, b) do i = 1, M b(i) = a(i) + b(i) end do end subroutine (..., which is called from a '!$omp target data use_device_addr(a, b)' inside a '!$omp target data map(tofrom:a(1:M), b(1:M))'.) If I 'diff' the '-fopenmp -fdump-tree-all' without vs. with the 'has_device_addr(a, b)' clause, I -- unexpectedly -- get no differences (aside from minor ones due to what seems to be different order of compiler temporaries): 'pr.f90.005t.original': #pragma omp target [...] (Decomposed combined construct. Is that perhaps where the problem lies?) 'pr.f90.006t.gimple': #pragma omp target num_teams(0) thread_limit(0) firstprivate(m) map(tofrom:*b [len: D.4283][implicit]) map(alloc:b [pointer assign, bias: 0]) map(tofrom:*a [len: D.4280][implicit]) map(alloc:a [pointer assign, bias: 0]) [...] That is, 'map' instead of 'has_device_addr'. In contrast, for the translated C code: void vectorAdd(int *a, int *b, int M) { #pragma omp target teams distribute parallel for has_device_addr(a, b) for (int i = 1; i < M; ++i) b[i] = a[i] + b[i]; } ..., I see the expected 'diff' of 'pr.c.005t.original': - #pragma omp target + #pragma omp target has_device_addr(a) has_device_addr(b) ..., and 'diff' of 'pr.c.006t.gimple': - #pragma omp target num_teams(0) thread_limit(0) firstprivate(M) map(alloc:MEM[(char *)b] [len: 0]) map(firstprivate:b [pointer assign, bias: 0]) map(alloc:MEM[(char *)a] [len: 0]) map(firstprivate:a [pointer assign, bias: 0]) + #pragma omp target num_teams(0) thread_limit(0) has_device_addr(a) has_device_addr(b) firstprivate(M) (Have not examined that one any further.) Cross-checking with corresponding OpenACC/Fortran 'deviceptr' clause ('!$acc parallel loop deviceptr(a, b)'), that seems to work as expected (from a quick look, not futher examined): 'pr.f90.005t.original': - #pragma acc parallel + #pragma acc parallel map(force_deviceptr:*a) map(alloc:a [pointer assign, bias: 0]) map(force_deviceptr:*b) map(alloc:b [pointer assign, bias: 0]) 'pr.f90.006t.gimple': - #pragma omp target oacc_parallel firstprivate(D.4291) map(tofrom:*b [len: D.4298]) map(alloc:b [pointer assign, bias: 0]) map(tofrom:*a [len: D.4295]) map(alloc:a [pointer assign, bias: 0]) + a.8_9 = a; + b.9_10 = b; + #pragma omp target oacc_parallel map(force_deviceptr:(*a.8_9) [len: D.4298]) map(alloc:a [pointer assign, bias: 0]) map(force_deviceptr:(*b.9_10) [len: D.4295]) map(alloc:b [pointer assign, bias: 0]) firstprivate(D.4291) That's with GCC based on fairly recent commit de99049f6fe5341024d4d939ac50d063280f90db (2023-01-11).