https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102424
Bug ID: 102424 Summary: OpenACC 'reduction' with outer 'loop seq', inner 'loop gang' Product: gcc Version: unknown Status: UNCONFIRMED Keywords: openacc Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tschwinge at gcc dot gnu.org CC: frederik at gcc dot gnu.org Target Milestone: --- Working on OpenACC 'kernels', Frederik noticed that 'libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90' (<https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90;hb=ffbdd78a4a84d80a5303d4f7a20553cf96954db9>) misbehaves if put into OpenACC 'parallel' form as follows: @@ -16,13 +17,13 @@ subroutine bar(vol) INTEGER :: vol INTEGER :: j,k - !$ACC KERNELS - !$ACC LOOP REDUCTION(+:vol) + !$ACC PARALLEL + !$ACC LOOP SEQ REDUCTION(+:vol) DO k=1,2 - !$ACC LOOP REDUCTION(+:vol) + !$ACC LOOP GANG VECTOR REDUCTION(+:vol) DO j=1,2 vol = vol + 1 ENDDO ENDDO - !$ACC END KERNELS + !$ACC END PARALLEL end subroutine bar (Unusual here is the outer 'loop' with 'seq' clause.) GCC accepts this without diagnostic -- but produces unexpected (wrong?) results at runtime! (Though, not 100 %...) It seems that generally this can be cured by avoiding gang parallelism in the inner loop. The problem can also be cured by putting a explicit 'reduction(+:vol)' clause onto the compute construct itself (instead of implicit 'copy(vol)' clause per current GCC implementation) -- and I can see how that triggers different ("proper") handling of 'var' as a reduction variable at the top-level in the compute region. In <https://github.com/OpenACC/openacc-spec/issues/410> (only visible to members of the GitHub OpenACC organization) I'm discussing whether this is a quality of implementation issue or a specification issue.