https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958
Bug ID: 120958 Summary: tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec Product: gcc Version: 15.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target Milestone: --- Since my commit r14-5831-gaae723d360ca26 (Martin Jambor: sra: SRA of non-escaped aggregates passed by reference to calls), gcc produces a non-workinfg MPI version of the CG benchmark from NAS Parallel Benchmarks version 3.3.1, which is written in Fortran 77. (The benchmark has been re-written in a newer version of Fortran in version 3.4 of the suite and I suspect that one no longer has this problem). The problem is that tree-sra is told by escape analysis that the address of the first parameter of mpi_irecv does not escape. And so the aggregate passed in that parameter is an SRA candidate and is broken down into scalar components and these are reloaded immediately after the function returns returns and not after a call of mpi_wait. The reason why escape analysis says that is that fnspec of mpi_irecv, is ". w w w w w w w w " which indeed says (the first w) that the first parameter does not escape. This fnspec is created by function in gcc/fortran/trans-types.cc which, AFAICT, simply deduces it from the call statement in the benchmark source (but I may be easily wrong here). My first impression was that this is simply a limitation of Fortran 77 and asynchronous MPI simply cannot work in this language standard. However, Richi pointed out that there must be a lot of Fortran 77 code using asynchronous MPI that we do not want to break, which is a reasonable point of view. The benchmark can be downloaded from https://www.nas.nasa.gov/software/npb.html. I have used mpich 4.1.2 MPI implementation from openSUSE Leap 15.6 and my configuration file config/make.def is: ---------------------------------------------------------------------- ## Compiler MPIF77 = mpif77 MPICC = mpicc # libhugetlbfs relinking LHBDT = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-link=BDT LHB = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-link=B LHALIGN = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-align # Fortran Optimisation FLINK = mpif77 F_LIB = $(LHRELINK) $(LHLIB) F_INC = FFLAGS = -O3 -mcmodel=large -g -fallow-argument-mismatch -fallow-invalid-boz -m64 FLINKFLAGS = -O3 -lmpi -g -fallow-argument-mismatch -fallow-invalid-boz -mcmodel=large -m64 $(LHRELINK) $(LHLIB) # C Optimisation CLINK = mpicc C_LIB = $(LHRELINK) $(LHLIB) C_INC = CFLAGS = -O3 -mcmodel=large -m64 CLINKFLAGS = -O3 -lmpi -mcmodel=large -m64 $(LHRELINK) $(LHLIB) # Other UCC = mpicc BINDIR = ../bin RAND = randi8 WTIME = wtime.c ---------------------------------------------------------------------- The problematic variable which is SRAed is norm_temp2 defined on line: double precision norm_temp1(2), norm_temp2(2) and then used in code snippet: do i = 1, l2npcols if (timeron) call timer_start(t_ncomm) call mpi_irecv( norm_temp2, > 2, > dp_type, > reduce_exch_proc(i), > i, > mpi_comm_world, > request, > ierr ) call mpi_send( norm_temp1, > 2, > dp_type, > reduce_exch_proc(i), > i, > mpi_comm_world, > ierr ) call mpi_wait( request, status, ierr ) if (timeron) call timer_stop(t_ncomm) norm_temp1(1) = norm_temp1(1) + norm_temp2(1) norm_temp1(2) = norm_temp1(2) + norm_temp2(2) enddo