Package: gcc-14-offload-nvptx Version: 14.2.0-19 Severity: normal Dear Maintainer,
I found out that in the current debian gcc-14-offload-nvptx and gcc-13-offload-nvptx, if I compile a code that requires unified_shared_memory and uses openmp to offload to gpu, the code is never run on the gpu. It does compile the offload code, but then it is never executed on the gpu. If I attempt to run the code with OMP_TARGET_OFFLOAD=MANDATORY, it gives the error: libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available which is incorrect. If I comment out the line #pragma omp requires unified_shared_memory then the code _does_ run on the gpu. What is weird is that exactly the same code, on exactly the same hardware, used to work in the past. I cannot pinpoint exactly what update of what package caused this issue. I do know that the same code worked on my laptop and still does offload to the gpu on another system with an old centos system and hand-compiled gcc 13 with offload to nvptx-none. Did anything change in the required system setup and I missed it? What puzzles me is that it did use to work on exactly the same laptop, with sid, just some time ago (one year?). And since the code _does_ compile and _does_ run without errors unless one explicitly uses OMP_TARGET_OFFLOAD=MANDATORY, the change may have occurred some time in the past and I did not notice till now. I paste here a simple hello world code that shows the issue #include <stdio.h> #include <math.h> #ifdef _OPENMP #include <omp.h> #endif #define NX 100000000 #pragma omp requires unified_shared_memory int main(void) { double vecA[NX],vecB[NX],vecC[NX]; double r=0.2; /* Initialization of vectors */ #pragma omp target teams distribute parallel for simd for (long i = 0; i < NX; i++) { vecA[i] = pow(r, i); vecB[i] = 1.0; } /* dot product of two vectors */ #pragma omp target teams distribute parallel for simd for (long i = 0; i < NX; i++) { vecC[i] = vecA[i] * vecB[i]; } double sum = 0.0; /* calculate the sum */ #pragma omp target teams distribute parallel for simd reduction(+:sum) for (long i = 0; i < NX; i++) { sum += vecC[i]; } printf("The sum is: %8.6f \n", sum); return 0; } which I can compile with e.g. gcc -O3 -fopenmp -foffload=nvptx-none -foffload-options="-O3 -fopt-info -lm" -o test test.c -lm which compiles flawlessly and, due to the -fopt-info option given to the offload compiler (and only to the offload compiler), gives out the following info: test.c:25:14: optimized: loop unrolled 7 times test.c:32:9: optimized: loop unrolled 7 times test.c:19:14: optimized: loop unrolled 3 times test.c:30:11: optimized: basic block part vectorized using 16 byte vectors test.c:23:9: optimized: basic block part vectorized using 16 byte vectors test.c:16:9: optimized: basic block part vectorized using 16 byte vectors if I run it (after allowing for a large stack with ulimit -s unlimited) with just ./test I get: The sum is: 1.250000 if I run it with OMP_TARGET_OFFLOAD=MANDATORY ./test and the #pragma omp requires unified_shared_memory line was uncommented, I get libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available while if I comment out #pragma omp requires unified_shared_memory recompile and rerun with OMP_TARGET_OFFLOAD=MANDATORY ./test it does run on the gpu and produce the expected result. I actually verified with nvidia-smi that it is indeed running on the gpu. Thanks in advance for any info on how to get it to run again on the gpu also when #pragma omp requires unified_shared_memory is enabled. Best regards, Giacomo Mulas -- System Information: Debian Release: 13.0 APT prefers unstable APT policy: (401, 'unstable'), (10, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 6.12.32-amd64 (SMP w/12 CPU threads; PREEMPT) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_WARN, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=it_IT.UTF-8, LC_CTYPE=it_IT.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages gcc-14-offload-nvptx depends on: ii gcc-14 14.2.0-19 ii gcc-14-base 14.2.0-19 ii libc6 2.41-8 ii libc6-dev 2.41-8 ii libgmp10 2:6.3.0+dfsg-3 ii libgomp-plugin-nvptx1 14.2.0-19 ii libmpc3 1.3.1-1+b3 ii libmpfr6 4.2.2-1 ii libzstd1 1.5.7+dfsg-1 ii nvptx-tools 0.20240810-3 ii zlib1g 1:1.3.dfsg+really1.3.1-1+b1 gcc-14-offload-nvptx recommends no packages. gcc-14-offload-nvptx suggests no packages. -- no debconf information