https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783
--- Comment #4 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- I also want to note that this happens with a recent nvidia driver 580.105.08 : 0/580 and cuda 13 (with nvidia-cuda-toolkit-13.02), where sam made a gcc patch that makes it drop sm_50 https://bugs.gentoo.org/965845 and allows compilation with cuda-13. So the wrong code generation does not depend on the nvidia-drivers or the cuda-toolkit version. And since clang compiles even rather complicated OpenMP code correctly, I guess one can exclude a malfunction of my card. I still have my old gpu. I would be willing to lend my rtx 5060 Ti for a weekend to a developer if that is useful to provide a fix. But probably it will take longer to investigate the source of these problems. For clang and nvidia, i found this document on intrinsics of blackwell. Perhaps that could help a bit, since clang compiles correct cuda code for my system: https://llvm.org/devmtg/2025-04/slides/technical_talk/ozen_blackwell.pdf Clang compiles fine, unless I use the message passing interface OpenMPI with offloading. Then clang shows memory errors with cuda even if I do not actually use OpenMP code, just configuring the clang offload compiler suffices then. OpenMPI devs say that this would be a clang problem, apparently when it initializes the runtime: https://github.com/open-mpi/ompi/issues/13431#issuecomment-3558265950 So one can not copy the entire approach of clang into gcc... but perhaps one may adapt some of its (working) blackwell support somehow...
