https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121416
--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> --- For completeness, modifying OpenACC's reduction-cplx-dbl.c to use atomics, i.e. #pragma acc parallel num_gangs (32) copyin(ary[0:N]) copy(tsum,tprod) #pragma acc loop gang for (int ix = 0; ix < N; ix++) { #pragma acc atomic update __real__ tsum += __real__ ary[ix]; #pragma acc atomic update __imag__ tsum += __imag__ ary[ix]; also yields the correct result. [Here, with atomics, the data is updated on every step - and not once per threads/worker and once per team/gang as with reductions.]