https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150
--- Comment #1 from Tobias Burnus <burnus at gcc dot gnu.org> ---
* You compilation uses "-O0" – I do not know whether that's intended.
* I did not see any timeout message although it did take a while to run
with offloading. (See timing results below.)
I wonder what causes the problem you are seeing.
You could try whether setting the environment variable
GOMP_DEBUG=1
shows some useful details for the launch.
* The OpenACC test case is wrong as "c" has to be "copy" not "copyout"
as the initial value is used (→ NaN)
On the technical side, at startup, one calls:
cuLaunchKernel
and when that has succeeded, one calls
cuCtxSynchronize
and if that fails, the error message is printed with
cuda_error
which shows the time-out message:
libgomp: cuCtxSynchronize error: the launch timed out and was terminated
I added a ", sum(c)" to the print output and did some tests:
On AMDGCN:
== -O0 == 3.56800008 268048112.
== -Ofast == 0.109999999 268698816.
== -fopenmp -O0 == 193.227997 268186448.
== -fopenmp -Ofast == 43.1559982 268455872.
== -fopenacc -O0 == 186.399002 268531136.
== -fopenacc -Ofast == 43.4970016 268206464.
== -fopenmp -foffload=disable -O0 == 7.27299976 268241776.
== -fopenmp -foffload=disable -Ofast == 1.49000001 268171680.
On NVidia:
== -O0 == 8.00599957 268253520.
== -Ofast == 0.254999995 268399056.
== -fopenmp -O0 == 64.2089996 268092608.
== -fopenmp -Ofast == 33.6360016 268359952.
== -fopenacc -O0 == 0.861999989 NaN (see note)
== -fopenacc -Ofast == 0.300000012 NaN (see note)
== -fopenmp -foffload=disable -O0 == 15.2220001 268511968.
== -fopenmp -foffload=disable -Ofast == 3.52900004 268573568.
== -fopenacc -foffload=disable -O0 == 14.5790005 268442496.
== -fopenacc -foffload=disable -Ofast == 4.41099977 268511968.