https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118518
Thomas Schwinge <tschwinge at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- See Also|https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=106445, | |https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=105019 | Last reconfirmed| |2025-03-26 Status|UNCONFIRMED |NEW Keywords| |ice-on-valid-code, openacc Depends on| |89499, 106445, 117010 CC| |tschwinge at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #12 from Thomas Schwinge <tschwinge at gcc dot gnu.org> --- Thanks for your submission; I'm working through this and your other ones. (In reply to Benjamin Schulz from comment #11) > if i write something like this: > SET (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fopenacc -foffload=nvptx-none > -foffload=-malias -fcf-protection=none -fno-stack-protector > -U_FORTIFY_SOURCE -std=c++23 -no-pie") > > it still complains that alias definitions are not supported. PTX '.alias' is available for PTX 6.3+, which GCC 14 doesn't default to, so you'll need '-foffload-options=nvptx-none=-mptx=6.3' in addition to '-foffload-options=nvptx-none=-malias'. I'm working on (the upcoming) GCC 15. Current status for nvptx offloading for your code per the 2025-02-04 attachments (with the MPI things disabled): With '-std=c++23 -fopenacc -O0', we run into missing undeclared/missing symbols during nvptx offloading compilation: ptxas ./a.xnvptx-none.mkoffload.o, line 1543; error : Call to '_ZSt3powImdEN9__gnu_cxx11__promote_2IDTplcvNS1_IT_XsrSt12__is_integerIS2_E7__valueEE6__typeELi0EcvNS1_IT0_XsrS3_IS7_E7__valueEE6__typeELi0EEXsrS3_ISB_E7__valueEE6__typeES2_S7_' requires call prototype ptxas ./a.xnvptx-none.mkoffload.o, line 2561; error : Call to '_ZN10datastructIdED1Ev' requires call prototype ptxas ./a.xnvptx-none.mkoffload.o, line 2568; error : Call to '_ZN10datastructIdED1Ev' requires call prototype ptxas ./a.xnvptx-none.mkoffload.o, line 2575; error : Call to '_ZN10datastructIdED1Ev' requires call prototype ptxas ./a.xnvptx-none.mkoffload.o, line 3335; error : Call to '_ZN10datastructIdED1Ev' requires call prototype ptxas ./a.xnvptx-none.mkoffload.o, line 1543; error : Unknown symbol '_ZSt3powImdEN9__gnu_cxx11__promote_2IDTplcvNS1_IT_XsrSt12__is_integerIS2_E7__valueEE6__typeELi0EcvNS1_IT0_XsrS3_IS7_E7__valueEE6__typeELi0EEXsrS3_ISB_E7__valueEE6__typeES2_S7_' ptxas ./a.xnvptx-none.mkoffload.o, line 2561; error : Unknown symbol '_ZN10datastructIdED1Ev' ptxas ./a.xnvptx-none.mkoffload.o, line 2568; error : Unknown symbol '_ZN10datastructIdED1Ev' ptxas ./a.xnvptx-none.mkoffload.o, line 2575; error : Unknown symbol '_ZN10datastructIdED1Ev' ptxas ./a.xnvptx-none.mkoffload.o, line 3335; error : Unknown symbol '_ZN10datastructIdED1Ev' [...] ptxas fatal : Ptx assembly aborted due to errors nvptx-as: ptxas returned 255 exit status The first 'error' is what I just filed as PR119485 "OpenACC offloading compilation failure/ICE for C++ templated library functions". The following 'error's, C++ destructors, that's very likely the issue already reported/discussed in PR106445 "nvptx offloading: C++ constructor symbol alias getting lost", PR117010 "[nvptx] Incorrect ptx code-gen for C++ code with templates", which I'm looking into. With '-std=c++23 -fopenacc -O1', we run into PR89499 "ICE in expand_UNIQUE, at internal-fn.c:2605", which I need to resolve... Therefore, add '-fno-inline'. However, with '-std=c++23 -fopenacc -O1 -fno-inline', GCC then again ICEs during nvptx offloading compilation as discussed in PR119485 "OpenACC offloading compilation failure/ICE for C++ templated library functions". Thus, replace 'pow([...])' with 'powf([...])'. With this, compilation succeeds (within the bounds set above), and we get Nvidia GPU execution as follows: $ ./a.out Ordinary matrix multiplication, on gpu 80 90 100 110 176 202 228 254 272 314 356 398 368 426 484 542 A Cholesky decomposition with the multiplication on gpu 4 12 -16 12 37 -43 -16 -43 98 2 0 0 6 1 0 -8 5 3 Now the cholesky decomposition is entirely done on gpu 2 0 0 6 1 0 -8 5 3 Now we do the same with the lu decomposition 1 -2 -2 -3 3 -9 0 -9 -1 2 4 7 -3 -6 26 2 Just the multiplication on gpu 1 0 0 0 3 1 0 0 -1 -0 1 0 -3 4 -2 1 1 -2 -2 -3 0 -3 6 0 0 0 2 4 0 0 0 1 Entirely on gpu 1 0 0 0 3 1 0 0 -1 -0 1 0 -3 4 -2 1 1 -2 -2 -3 0 -3 6 0 0 0 2 4 0 0 0 1 Now we do the same with the qr decomposition 12 -51 4 6 167 -68 -4 24 -41 Just the multiplication on gpu 0.857143 -0.394286 -0.331429 0.428571 0.902857 0.0342857 -0.285714 0.171429 -0.942857 14 21 -14 -8.88178e-16 175 -70 -2.63678e-15 -5.06262e-14 35 Entirely on gpu libgomp: cuStreamSynchronize error: an illegal memory access was encountered (I've not checked these numbers, and not looked into that device-side SIGSEGV.) Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89499 [Bug 89499] [12/13/14/15 Regression] ICE in expand_UNIQUE, at internal-fn.c:2605 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106445 [Bug 106445] nvptx offloading: C++ constructor symbol alias getting lost https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117010 [Bug 117010] [nvptx] Incorrect ptx code-gen for C++ code with templates