https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783

--- Comment #3 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Created attachment 63007
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63007&action=edit
main.cpp

In order to check whether this has anything to do with cuda 13, i now installed

nvidia-driver-570.195.03

then nvidia-smi returns the cuda version  
cuda 12.8,

Sadly, 

compute-sanitizer --tool memcheck ./a.out

then shows the same errors for my nvidia RTX 5060 Ti as above when compiling
the above snippet with 

gcc-16 -fopenmp -foffload=nvptx-none -save-temps -fno-stack-protector
./main.cpp

gcc, while compiling with clang (which has dedicated sm_120 support from
nvidia) yields no errors. So this seems to to have something to do with
blackwell.

Nvidia helped Clang with its blackwell support, so this document may therefore
be of interest to developers.

https://llvm.org/devmtg/2025-04/slides/technical_talk/ozen_blackwell.pdf

The reported errors are not just errors of cuda-sanitizer. 

To this post, I attached a matrix multiplication which does the computation
with single threads on host, with a parallel for collapse(2) statement in front
of the first two loops on host, on gpu with a target teams distribute statement
over the first loop and a parallel for construct over the second loop, and a
target teams parallel for collapse 2 statement over the first two loops. Sadly,
the results do not agree when the code is compiled with gcc (where nvidia's
compute sanitizer shows the same errors), while after compiling it with clang
the results are always correct (i.e. they agree to the single threaded version)
and compute sanitizer shows no errors.

Reply via email to