https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958

            Bug ID: 120958
           Summary: tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in
                    Fortran 77 because of wrong fnspec
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---

Since my commit r14-5831-gaae723d360ca26 (Martin Jambor: sra: SRA of
non-escaped aggregates passed by reference to calls), gcc produces a
non-workinfg MPI version of the CG benchmark from NAS Parallel
Benchmarks version 3.3.1, which is written in Fortran 77. (The
benchmark has been re-written in a newer version of Fortran in version
3.4 of the suite and I suspect that one no longer has this problem).

The problem is that tree-sra is told by escape analysis that the
address of the first parameter of mpi_irecv does not escape.  And so
the aggregate passed in that parameter is an SRA candidate and is
broken down into scalar components and these are reloaded immediately
after the function returns returns and not after a call of mpi_wait.

The reason why escape analysis says that is that fnspec of mpi_irecv,
is ". w w w w w w w w " which indeed says (the first w) that the first
parameter does not escape.

This fnspec is created by function in gcc/fortran/trans-types.cc
which, AFAICT, simply deduces it from the call statement in the
benchmark source (but I may be easily wrong here).

My first impression was that this is simply a limitation of Fortran 77
and asynchronous MPI simply cannot work in this language standard.
However, Richi pointed out that there must be a lot of Fortran 77 code
using asynchronous MPI that we do not want to break, which is a
reasonable point of view.

The benchmark can be downloaded from
https://www.nas.nasa.gov/software/npb.html.  I have used mpich 4.1.2
MPI implementation from openSUSE Leap 15.6 and my configuration file
config/make.def is:

----------------------------------------------------------------------
## Compiler
MPIF77             = mpif77
MPICC              = mpicc

# libhugetlbfs relinking
LHBDT = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-link=BDT
LHB = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-link=B
LHALIGN = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-align

# Fortran Optimisation
FLINK              = mpif77
F_LIB              = $(LHRELINK) $(LHLIB)
F_INC              =
FFLAGS             = -O3  -mcmodel=large  -g -fallow-argument-mismatch
-fallow-invalid-boz -m64
FLINKFLAGS         = -O3  -lmpi -g -fallow-argument-mismatch
-fallow-invalid-boz -mcmodel=large -m64 $(LHRELINK) $(LHLIB)

# C Optimisation
CLINK              = mpicc
C_LIB              = $(LHRELINK) $(LHLIB)
C_INC              =
CFLAGS             = -O3  -mcmodel=large -m64
CLINKFLAGS         = -O3  -lmpi -mcmodel=large -m64 $(LHRELINK) $(LHLIB)

# Other
UCC                = mpicc
BINDIR             = ../bin
RAND               = randi8
WTIME              = wtime.c
----------------------------------------------------------------------

The problematic variable which is SRAed is norm_temp2 defined on line:

      double precision   norm_temp1(2), norm_temp2(2)

and then used in code snippet:

         do i = 1, l2npcols
            if (timeron) call timer_start(t_ncomm)
            call mpi_irecv( norm_temp2,
     >                      2,
     >                      dp_type,
     >                      reduce_exch_proc(i),
     >                      i,
     >                      mpi_comm_world,
     >                      request,
     >                      ierr )
            call mpi_send(  norm_temp1,
     >                      2,
     >                      dp_type,
     >                      reduce_exch_proc(i),
     >                      i,
     >                      mpi_comm_world,
     >                      ierr )
            call mpi_wait( request, status, ierr )
            if (timeron) call timer_stop(t_ncomm)

            norm_temp1(1) = norm_temp1(1) + norm_temp2(1)
            norm_temp1(2) = norm_temp1(2) + norm_temp2(2)
         enddo

Reply via email to