https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783
--- Comment #3 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- Created attachment 63007 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63007&action=edit main.cpp In order to check whether this has anything to do with cuda 13, i now installed nvidia-driver-570.195.03 then nvidia-smi returns the cuda version cuda 12.8, Sadly, compute-sanitizer --tool memcheck ./a.out then shows the same errors for my nvidia RTX 5060 Ti as above when compiling the above snippet with gcc-16 -fopenmp -foffload=nvptx-none -save-temps -fno-stack-protector ./main.cpp gcc, while compiling with clang (which has dedicated sm_120 support from nvidia) yields no errors. So this seems to to have something to do with blackwell. Nvidia helped Clang with its blackwell support, so this document may therefore be of interest to developers. https://llvm.org/devmtg/2025-04/slides/technical_talk/ozen_blackwell.pdf The reported errors are not just errors of cuda-sanitizer. To this post, I attached a matrix multiplication which does the computation with single threads on host, with a parallel for collapse(2) statement in front of the first two loops on host, on gpu with a target teams distribute statement over the first loop and a parallel for construct over the second loop, and a target teams parallel for collapse 2 statement over the first two loops. Sadly, the results do not agree when the code is compiled with gcc (where nvidia's compute sanitizer shows the same errors), while after compiling it with clang the results are always correct (i.e. they agree to the single threaded version) and compute sanitizer shows no errors.
