https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120865
--- Comment #3 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- Options were: -O1 -fopenmp -foffload=nvptx-none -fno-stack-protector -Wall Note that without -O i get the following: (i.e. without optimization, the program terminates ordinarily...) Ordinary matrix multiplication, on gpu 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 80 90 100 110 176 202 228 254 272 314 356 398 368 426 484 542 A Cholesky decomposition with the multiplication on gpu 4 12 -16 12 37 -43 -16 -43 98 2 0 0 6 1 0 -8 5 3 Now the cholesky decomposition is entirely done on gpu 2 0 0 6 1 0 -8 5 3 Now we do the same with the lu decomposition 1 -2 -2 -3 3 -9 0 -9 -1 2 4 7 -3 -6 26 2 Just the multiplication on gpu 1 0 0 0 3 1 0 0 -1 -0 1 0 -3 4 -2 1 1 -2 -2 -3 0 -3 6 0 0 0 2 4 0 0 0 1 Entirely on gpu 1 0 0 0 3 1 0 0 -1 -0 1 0 -3 4 -2 1 1 -2 -2 -3 0 -3 6 0 0 0 2 4 0 0 0 1 Now we do the same with the qr decomposition 12 -51 4 6 167 -68 -4 24 -41 Just the multiplication on gpu 0.857143 -0.394286 -0.331429 0.428571 0.902857 0.0342857 -0.285714 0.171429 -0.942857 14 21 -14 -2.22045e-16 175 -70 -3.10862e-15 -4.79616e-14 35 Entirely on gpu 0.857143 -0.394286 -0.626059 0.428571 0.902857 -0.127334 -0.285714 0.171429 -0.769309 14 21 -14 -2.22045e-16 175 -70 -5.19947 -7.7992 37.6962 Process returned 0 (0x0) execution time : 0.829 s Press ENTER to continue.