https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783

--- Comment #4 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
I also want to note that this happens with a recent nvidia driver 580.105.08  :
0/580 and cuda 13 (with nvidia-cuda-toolkit-13.02), where sam made a gcc patch
that makes it drop sm_50 https://bugs.gentoo.org/965845 and allows compilation
with cuda-13.

 So the wrong code generation does not depend on the nvidia-drivers or the
cuda-toolkit version. 

And since clang compiles even rather complicated OpenMP code correctly, I guess
one can exclude a malfunction of my card.

I still have my old gpu. I would be willing to lend my rtx 5060 Ti for a
weekend to a developer if that is useful to provide a fix. But probably it will
take longer to investigate the source of these problems.

For clang and nvidia, i found this document on intrinsics of blackwell. Perhaps
that could help a bit, since clang compiles correct cuda code for my system:

https://llvm.org/devmtg/2025-04/slides/technical_talk/ozen_blackwell.pdf

Clang compiles fine, unless I use the message passing interface OpenMPI with
offloading. Then clang shows memory errors with cuda even if I do not actually
use OpenMP code, just configuring the clang offload compiler suffices then.
OpenMPI devs say that this would be a clang problem, apparently when it
initializes the runtime:
https://github.com/open-mpi/ompi/issues/13431#issuecomment-3558265950 So one
can not copy the entire approach of clang into gcc... but perhaps one may adapt
some of its (working) blackwell support somehow...

Reply via email to