https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121178
--- Comment #2 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- oh well sorry, in the last snipped that line #pragma omp target enter data map(to:t.data[0:20]) should not be there... it should be like: mytensor t; int strides[2]={1,2}; int extents[2]={4,5}; t.data=double*)omp_target_alloc(sizeof(double)*20,omp_get_default_device()); t.strides=strides; t.extents=extents; #pragma omp target enter data map(to:t) #pragma omp target enter data map(to:t.strides[0:2]) #pragma omp target enter data map(to:t.extents[0:2]) #pragma omp target teams distribute for(int i=1; i<20; i++) { t.data[i]=20; } omp_target_free(t.data,omp_get_default_device()); #pragma omp target exit data map (delete:t.strides[0:2]) #pragma omp target exit data map (delete:t.extents[0:2]) #pragma omp target exit data map(delete:t) the conclusions remain the same... It is just strange to get a call stack with memcpy to device issued before alloc. Start Duration Name Result CorrID Pid Tid T-Pri Thread Name 0,350945s 1,140 μs cuInit 0 2 26641 26641 0 OpenMP Initial Thread 0,350968s 118,991 ms cuCtxCreate_v2 0 8 26641 26641 0 OpenMP Initial Thread 0,470763s 2,037 ms cuLinkCreate_v2 0 24 26641 26641 0 OpenMP Initial Thread 0,482308s 1,933 ms cuLinkComplete 0 68 26641 26641 0 OpenMP Initial Thread 0,484242s 3,029 ms cuModuleLoadData 0 69 26641 26641 0 OpenMP Initial Thread 0,487272s 1,050 μs cuLinkDestroy 0 70 26641 26641 0 OpenMP Initial Thread 0,48764s 12,930 μs cuMemcpyHtoD_v2 0 82 26641 26641 0 OpenMP Initial Thread 0,487655s 79,391 μs cuMemAlloc_v2 0 84 26641 26641 0 OpenMP Initial Thread 0,487737s 3,470 μs cuMemAlloc_v2 0 86 26641 26641 0 OpenMP Initial Thread 0,487743s 6,060 μs cuMemcpyHtoD_v2 0 89 26641 26641 0 OpenMP Initial Thread 0,48775s 2,540 μs cuMemAlloc_v2 0 91 26641 26641 0 OpenMP Initial Thread 0,487755s 2,340 μs cuMemAlloc_v2 0 93 26641 26641 0 OpenMP Initial Thread 0,487758s 3,580 μs cuMemcpyHtoD_v2 0 96 26641 26641 0 OpenMP Initial Thread 0,487762s 17,810 μs cuMemcpyHtoD_v2 0 99 26641 26641 0 OpenMP Initial Thread 0,487781s 2,660 μs cuMemAlloc_v2 0 101 26641 26641 0 OpenMP Initial Thread 0,487785s 3,630 μs cuMemcpyHtoD_v2 0 104 26641 26641 0 OpenMP Initial Thread 0,487789s 4,970 μs cuMemcpyHtoD_v2 0 107 26641 26641 0 OpenMP Initial Thread 0,487796s 2,360 μs cuMemAlloc_v2 0 109 26641 26641 0 OpenMP Initial Thread 0,487799s 3,310 μs cuMemcpyHtoD_v2 0 112 26641 26641 0 OpenMP Initial Thread 0,487803s 67,731 μs cuMemAlloc_v2 0 113 26641 26641 0 OpenMP Initial Thread 0,487872s 118,921 μs cuLaunchKernel 0 114 26641 26641 0 OpenMP Initial Thread 0,487992s 7,960 μs cuCtxSynchronize 0 115 26641 26641 0 OpenMP Initial Thread 0,488001s 4,500 μs cuMemFree_v2 0 118 26641 26641 0 OpenMP Initial Thread 0,488008s 2,940 μs cuMemFree_v2 0 121 26641 26641 0 OpenMP Initial Thread 0,488012s 4,880 μs cuMemcpyHtoD_v2 0 124 26641 26641 0 OpenMP Initial Thread 0,488018s 3,510 μs cuMemFree_v2 0 127 26641 26641 0 OpenMP Initial Thread 0,488023s 3,460 μs cuMemcpyHtoD_v2 0 130 26641 26641 0 OpenMP Initial Thread 0,488027s 4,661 μs cuMemFree_v2 0 133 26641 26641 0 OpenMP Initial Thread 0,488032s 2,530 μs cuMemFree_v2 0 136 26641 26641 0 OpenMP Initial Thread 0,488049s 96,211 μs cuMemFree_v2 0 142 26641 26641 0 OpenMP Initial Thread 0,488146s 65,299 ms cuCtxDestroy_v2 0 143 26641 26641 0 OpenMP Initial Thread