https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121178

            Bug ID: 121178
           Summary: gcc openacc: libgomp: struct not mapped for detach
                    operation if the borders are given on delete, nvc++
                    compiles correctly
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: schulz.benjamin at googlemail dot com
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

The following snipped compiles with nvidia's nvc++ and options:

/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/compilers/bin/nvc++ ./main.cpp -acc
-gpu=cuda12.9 -Minfo=all

20, Generating copyin(t2.strides[:2],t2.extents[:2],t2.data[:20],t2) [if not
already present]
         Generating present(t2)
         Generating NVIDIA GPU code
         28, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */
     37, Generating exit data
delete(t2,t2.data[:20],t2.strides[:2],t2.extents[:2])


runs correctly.

Gcc 15.1 yields with 

-fopenacc -foffload=nvptx-none  -fcf-protection=none -fno-stack-protector  
-no-pie  -U_FORTIFY_SOURCE


libgomp: struct not mapped for detach operation


#include <openacc.h>

struct mytensor
{
    int *strides;
    int *extents;
    double *data;
};

int main()
{

    mytensor t2;
    int strides[2]={1,5};
    int extents[2]={4,5};
   double data[20]={1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};

    t2.strides=strides;
    t2.extents=extents;
    t2.data=data;

#pragma acc data copyin(t2)
#pragma acc data copyin(t2.strides[0:2])
#pragma acc data copyin(t2.extents[0:2])
#pragma acc data copyin(t2.data[0:20])

#pragma acc parallel loop gang present(t2)
    for(int i=1; i<20; i++)
    {
        t2.data[i]=20;
    }

    #pragma acc exit data delete(t2.data[0:20])
    #pragma acc exit data delete(t2.extents[0:2])
    #pragma acc exit data delete(t2.strides[0:2])
    #pragma acc exit data delete(t2)
}



note that nvc++ also works when replacing the copyin statements by a single
line (with the correct order, the struct must first be mapped)

#pragma acc data copyin(t2,t2.strides[0:2],t2.extents[0:2],t2.data[0:20])

In that case, libgomp complains that it could not attach....

For the detach case, I can not put the exit data statements into a single line.
It would then complain that t2 appears several times and stop compilation.

Nevertheless, even by separating the commands and issuing the mappings in the
correct order, nvc++ appears to handle this, gcc not.



The reason for the detachment problem appears to be that I gave the borders of
the arrays.

If I write

    #pragma acc exit data delete(t2.data)
    #pragma acc exit data delete(t2.extents)
    #pragma acc exit data delete(t2.strides)
    #pragma acc exit data delete(t2)

Then libgomp does not complain anymore.

However... If i would compile that with nvc++ it would then write:

main:
     20, Generating copyin(t2.strides[:2],t2.extents[:2],t2.data[:20],t2) [if
not already present]
         Generating present(t2)
         Generating NVIDIA GPU code
         28, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */
     37, Generating exit data
delete(t2,t2.data[:1],t2.strides[:1],t2.extents[:1])


But I am surely not wanting to delete just the first element.....

Nvidia writes that one should make delete statements in openacc like this:

https://docs.nvidia.com/hpc-sdk/compilers/openacc-gs/index.html

#pragma acc exit data delete( data[0:size], this[0:1])
  • [Bug libgomp/121178] Ne... schulz.benjamin at googlemail dot com via Gcc-bugs

Reply via email to