reduction-cplx-dbl.c produces wrong gang-reduction result

burnus at gcc dot gnu.org via Gcc-bugs Tue, 05 Aug 2025 12:27:48 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121416


Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tschwinge at gcc dot gnu.org

--- Comment #1 from Tobias Burnus <burnus at gcc dot gnu.org> ---
It works with OpenMP:

#define N 100
static _Complex double ary[100], sum = 0, prod = 1;

void team_call () {
  _Complex double tsum = 0, tprod = 1;
  #pragma omp target teams distribute parallel for map(to: ary)
reduction(+:tsum) reduction(*:tprod)
    for (int ix = 0; ix < N; ix++)
      {
        tsum += ary[ix];
        tprod *= ary[ix];
      }
  __builtin_printf("debug:\n%f + i %f\n%f + i %f\n", __real__ prod, __imag__
prod, __real__ tprod, __imag__ tprod);
}

int main() {
  for (int ix = 0; ix < N;  ix++)
    {
      double frac = ix * (1.0 / 1024) + 1.0;
      ary[ix] = frac + frac * 2.0j - 1.0j;
      sum += ary[ix];
      prod *= ary[ix];
    }
  team_call ();
}

* * *

Note that the generated code uses
  GOMP_atomic_start ();
  …
  GOMP_atomic_end ();

and no ISA …_atomic intrinisc.

The two functions are in libgomp/atomic.c:

/* This mutex is used when atomic operations don't exist for the target
   in the mode requested.  The result is not globally atomic, but works so
   long as all parallel references are within #pragma omp atomic directives.
   According to responses received from o...@openmp.org, appears to be within
   spec.  Which makes sense, since that's how several other compilers
   handle this situation as well.  */
GOMP_atomic_start (void) {
  gomp_mutex_lock (&atomic_lock);
}

GOMP_atomic_end (void) {
  gomp_mutex_unlock (&atomic_lock);
}

Likewise with 'double' if there are two reductions - or one complex double, but
if there is only one double reduction, it uses (twice):

flat_atomic_cmpswap_X2  v[8:9], v[4:5], v[16:19]  sc0   ; tmp775,* _45, tmp776

* * *

With OpenACC, I see two
   flat_atomic_cmpswap     v0, v[4:5], v[16:17]  sc0
but no GOMP… function call.

* * *

I wonder whether for OpenACC, something similar is needed as for OpenMP.

quoting gcc/omp-expand.cc:

/* Expand an GIMPLE_OMP_ATOMIC statement.  We try to expand
   using expand_omp_atomic_fetch_op.  If it failed, we try to
   call expand_omp_atomic_pipeline, and if it fails too, the
   ultimate fallback is wrapping the operation in a mutex
   (expand_omp_atomic_mutex).  REGION is the atomic region built
   by build_omp_regions_1().  */

static void
expand_omp_atomic (struct omp_region *region)

[Bug target/121416] [gcn][MI300][CDNA3] libgomp.oacc-c-c++-common/reduction-cplx-dbl.c produces wrong gang-reduction result

Reply via email to