https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
Ok, let's first make a runnable test-case:
...
$ cat src/libgomp/testsuite/libgomp.oacc-c/test.c
#include <stdio.h>

#define TYPE float

TYPE a = 1;
TYPE b = 2;

int
main (void)
{

  printf ("A: %f\n", a);

#pragma acc parallel num_gangs (1) num_workers (1) copy (a, b)
#pragma acc atomic update
  a += b;

  printf ("A: %f\n", a);

  return !(a == 3);
}
...

Indeed we see the cas, but that has nothing to do with support in the nvptx
port:
...
                atom.cas.b32    %r29, [%r25], %r22, %r28;                       
...

This appears already at ompexp on the host, where we expand:
...
  #pragma omp atomic_load relaxed
    D.2555 = *D.2568

  <bb 4> :
  D.2557 = D.2555 + b.1;                                                        
  #pragma omp atomic_store relaxed (D.2557)
...
into:
...
  D.2583 = __atomic_load_4 (D.2582, 0);
  D.2584 = D.2583;

  <bb 4> :
  D.2585 = VIEW_CONVERT_EXPR<float>(D.2584);
  D.2586 = D.2585 + b.1;
  D.2587 = VIEW_CONVERT_EXPR<unsigned int>(D.2586);
  D.2588 = __sync_val_compare_and_swap_4 (D.2582, D.2584, D.2587);
...

This is part of a generic problem with offloading, where choices are made in
the host compiler which are suboptimal or even unsupported in the offload
compiler.

Ideally this should be addressed in the host compiler.

It may be possible to address this in the nvptx port by trying to detect the
unoptimal pattern and converting it to the optimal atom.add.f32.  But
ultimately that's a workaround, and it's better to fix this at the source.

Reply via email to