https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121178
Bug ID: 121178 Summary: gcc openacc: libgomp: struct not mapped for detach operation if the borders are given on delete, nvc++ compiles correctly Product: gcc Version: 15.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: schulz.benjamin at googlemail dot com CC: jakub at gcc dot gnu.org Target Milestone: --- The following snipped compiles with nvidia's nvc++ and options: /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/compilers/bin/nvc++ ./main.cpp -acc -gpu=cuda12.9 -Minfo=all 20, Generating copyin(t2.strides[:2],t2.extents[:2],t2.data[:20],t2) [if not already present] Generating present(t2) Generating NVIDIA GPU code 28, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */ 37, Generating exit data delete(t2,t2.data[:20],t2.strides[:2],t2.extents[:2]) runs correctly. Gcc 15.1 yields with -fopenacc -foffload=nvptx-none -fcf-protection=none -fno-stack-protector -no-pie -U_FORTIFY_SOURCE libgomp: struct not mapped for detach operation #include <openacc.h> struct mytensor { int *strides; int *extents; double *data; }; int main() { mytensor t2; int strides[2]={1,5}; int extents[2]={4,5}; double data[20]={1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}; t2.strides=strides; t2.extents=extents; t2.data=data; #pragma acc data copyin(t2) #pragma acc data copyin(t2.strides[0:2]) #pragma acc data copyin(t2.extents[0:2]) #pragma acc data copyin(t2.data[0:20]) #pragma acc parallel loop gang present(t2) for(int i=1; i<20; i++) { t2.data[i]=20; } #pragma acc exit data delete(t2.data[0:20]) #pragma acc exit data delete(t2.extents[0:2]) #pragma acc exit data delete(t2.strides[0:2]) #pragma acc exit data delete(t2) } note that nvc++ also works when replacing the copyin statements by a single line (with the correct order, the struct must first be mapped) #pragma acc data copyin(t2,t2.strides[0:2],t2.extents[0:2],t2.data[0:20]) In that case, libgomp complains that it could not attach.... For the detach case, I can not put the exit data statements into a single line. It would then complain that t2 appears several times and stop compilation. Nevertheless, even by separating the commands and issuing the mappings in the correct order, nvc++ appears to handle this, gcc not. The reason for the detachment problem appears to be that I gave the borders of the arrays. If I write #pragma acc exit data delete(t2.data) #pragma acc exit data delete(t2.extents) #pragma acc exit data delete(t2.strides) #pragma acc exit data delete(t2) Then libgomp does not complain anymore. However... If i would compile that with nvc++ it would then write: main: 20, Generating copyin(t2.strides[:2],t2.extents[:2],t2.data[:20],t2) [if not already present] Generating present(t2) Generating NVIDIA GPU code 28, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */ 37, Generating exit data delete(t2,t2.data[:1],t2.strides[:1],t2.extents[:1]) But I am surely not wanting to delete just the first element..... Nvidia writes that one should make delete statements in openacc like this: https://docs.nvidia.com/hpc-sdk/compilers/openacc-gs/index.html #pragma acc exit data delete( data[0:size], this[0:1])