https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
Bug ID: 97203 Summary: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: openmp, wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: burnus at gcc dot gnu.org CC: vries at gcc dot gnu.org Target Milestone: --- Target: nvptx Created attachment 49269 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49269&action=edit C testcase - compile with -fopenmp and "-O0", "-O1", and "-O1 -funsafe-math-optimizations" My impression is that this is again (→ PR95654) related to SIMT going somehow wrong, but I do not quite understand why. The code uses 'omp simd ... reduction(…)' — using 'omp parallel do ...' instead works. The big program works at -O0, fails with -O1/-O2 but starts working again if additionally -ffast-math is used. The fail is: libgomp: cuCtxSynchronize error: invalid program counter or libgomp: cuCtxSynchronize error: unspecified launch failure (perhaps abort was called) The attached program is a vastly reduced version, which has a similar fail and similar pattern, which may or may not have the same cause. – In any case: It uses 'omp simd' and, hence, nvptx's SIMT and inside 'omp simd': float cosArg = __builtin_cosf(expArg); float sinArg = __builtin_sinf(expArg); With with -O0 but also with -O1/-O2 -funsafe-math-optimizations it works and the code contains with -funsafe-math-optimizations: cos.approx.f32 %r73, %r75; sin.approx.f32 %r72, %r75; and with -O0 (and unsafe math disabled): call (%value_in), cosf, (%out_arg1); call (%value_in), sinf, (%out_arg1); But with -O1/-O2 it fails with: libgomp: cuCtxSynchronize error: an illegal memory access was encountered here, the sin/cos was turned into BUILT_IN_SINCOSF and we end up with the code: call cexpf, (%out_arg1, %out_arg2, %out_arg3); I have no idea why 'call cosf/sinf' inside 'omp simd' works but 'call cexpf' fails – nor whether that is indeed related to SIMT. I think there are two issues. Mainly: FIRST ISSUE: Why does it fail with 'cexpf'? * * * SECOND ISSUE: Missed optimization for BUILT_IN_SINCOSF: if (optab_handler (sincos_optab, mode) != CODE_FOR_nothing) ... else if (targetm.libc_has_function (function_sincos)) ... else ... fn = builtin_decl_explicit (BUILT_IN_CEXPF); Seems as if we do the latter. In newlib's ./newlib/libm/complex/cexpf.c: cexpf(float complex z) ... x = crealf(z); y = cimagf(z); r = expf(x); w = r * cosf(y) + r * sinf(y) * I; return w; which is not really a performance boost compared to just calling sinf/cosf ... Note that newlib does have newlib/libm/math/wf_sincos.c which does: void sincosf(float x, float *sinx, float *cosx) { *sinx = sinf (x); *cosx = cosf (x); Which avoids a bunch of '*' and '+' and inparticular an 'expf' call. (Should be still slower than directly calling sinf/cosf due to the call overhead, but much better than cexpf, unless implemented in hardware.)