SIMT and cexpf call

burnus at gcc dot gnu.org via Gcc-bugs Fri, 25 Sep 2020 01:52:10 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203


            Bug ID: 97203
           Summary: [nvptx] 'illegal memory access was encountered' with
                    'omp simd'/SIMT and cexpf call
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: openmp, wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org
                CC: vries at gcc dot gnu.org
  Target Milestone: ---
            Target: nvptx

Created attachment 49269
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49269&action=edit
C testcase - compile with -fopenmp and "-O0", "-O1", and "-O1
-funsafe-math-optimizations"

My impression is that this is again (→ PR95654) related to SIMT going somehow
wrong, but I do not quite understand why.


The code uses 'omp simd ... reduction(…)' — using 'omp parallel do ...' instead
works.


The big program works at -O0, fails with -O1/-O2 but starts working again if
additionally -ffast-math is used. The fail is:
  libgomp: cuCtxSynchronize error: invalid program counter
or
  libgomp: cuCtxSynchronize error: unspecified launch failure (perhaps abort
was called) 


The attached program is a vastly reduced version, which has a similar fail and
similar pattern, which may or may not have the same cause. – In any case:

It uses 'omp simd' and, hence, nvptx's SIMT and inside 'omp simd':
            float cosArg = __builtin_cosf(expArg);
            float sinArg = __builtin_sinf(expArg);

With with -O0 but also with -O1/-O2 -funsafe-math-optimizations it works and
the code contains with -funsafe-math-optimizations:
                cos.approx.f32  %r73, %r75;
                sin.approx.f32  %r72, %r75;
and with -O0 (and unsafe math disabled):
                call (%value_in), cosf, (%out_arg1);
                call (%value_in), sinf, (%out_arg1);

But with -O1/-O2 it fails with:
   libgomp: cuCtxSynchronize error: an illegal memory access was encountered
here, the sin/cos was turned into BUILT_IN_SINCOSF and we end up with the code:
   call cexpf, (%out_arg1, %out_arg2, %out_arg3);


I have no idea why 'call cosf/sinf' inside 'omp simd' works but 'call cexpf'
fails – nor whether that is indeed related to SIMT.


I think there are two issues. Mainly:

FIRST ISSUE: Why does it fail with 'cexpf'?

 * * *

SECOND ISSUE: Missed optimization for BUILT_IN_SINCOSF:

  if (optab_handler (sincos_optab, mode) != CODE_FOR_nothing)
...
  else if (targetm.libc_has_function (function_sincos))
...
  else
...
        fn = builtin_decl_explicit (BUILT_IN_CEXPF);


Seems as if we do the latter. In newlib's ./newlib/libm/complex/cexpf.c:

cexpf(float complex z)
...
        x = crealf(z);
        y = cimagf(z);
        r = expf(x);
        w = r * cosf(y) + r * sinf(y) * I;
        return w;

which is not really a performance boost compared to just calling sinf/cosf ...

Note that newlib does have newlib/libm/math/wf_sincos.c which does:
        void sincosf(float x, float *sinx, float *cosx)
{
  *sinx = sinf (x);
  *cosx = cosf (x);

Which avoids a bunch of '*' and '+' and inparticular an 'expf' call. (Should be
still slower than directly calling sinf/cosf due to the call overhead, but much
better than cexpf, unless implemented in hardware.)

[Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

Reply via email to