https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121416

--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> ---
For completeness, modifying OpenACC's reduction-cplx-dbl.c to use atomics, i.e.

#pragma acc parallel num_gangs (32) copyin(ary[0:N]) copy(tsum,tprod)
  #pragma acc loop gang
    for (int ix = 0; ix < N; ix++)
      {
#pragma acc atomic update
        __real__ tsum += __real__ ary[ix];
#pragma acc atomic update
        __imag__ tsum += __imag__ ary[ix];

also yields the correct result.

[Here, with atomics, the data is updated on every step - and not once per
threads/worker and once per team/gang as with reductions.]

Reply via email to