Hi, Sanjay,
Could you send your modified data exchange code (psetb.F)
with MPI_Waitall? See other inlined comments below. Thanks.
On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via
petsc-users <petsc-users@mcs.anl.gov
<mailto:petsc-users@mcs.anl.gov>> wrote:
Lawrence,
Thanks for taking a look! This is what I had been
wondering about -- my
knowledge of MPI is pretty minimal and
this origins of the routine were from a programmer we hired
a decade+
back from NERSC. I'll have to look into
VecScatter. It will be great to dispense with our
roll-your-own
routines (we even have our own reduceALL scattered around
the code).
Petsc VecScatter has a very simple interface and you definitely
should go with. With VecScatter, you can think in familiar
vectors and indices instead of the low level MPI_Send/Recv.
Besides that, PETSc has optimized VecScatter so that
communication is efficient.
Interestingly, the MPI_WaitALL has solved the problem when
using OpenMPI
but it still persists with MPICH. Graphs attached.
I'm going to run with openmpi for now (but I guess I really
still need
to figure out what is wrong with MPICH and WaitALL;
I'll try Barry's suggestion of
--download-mpich-configure-arguments="--enable-error-messages=all
--enable-g" later today and report back).
Regarding MPI_Barrier, it was put in due a problem that
some processes
were finishing up sending and receiving and exiting the
subroutine
before the receiving processes had completed (which
resulted in data
loss as the buffers are freed after the call to the routine).
MPI_Barrier was the solution proposed
to us. I don't think I can dispense with it, but will
think about some
more.
After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req),
you can safely free the send buffer without worry that the
receive has not completed. MPI guarantees the receiver can get
the data, for example, through internal buffering.
I'm not so sure about using MPI_IRecv as it will require a
bit of
rewriting since right now I process the received
data sequentially after each blocking MPI_Recv -- clearly
slower but
easier to code.
Thanks again for the help.
-sanjay
On 5/30/19 4:48 AM, Lawrence Mitchell wrote:
> Hi Sanjay,
>
>> On 30 May 2019, at 08:58, Sanjay Govindjee via
petsc-users <petsc-users@mcs.anl.gov
<mailto:petsc-users@mcs.anl.gov>> wrote:
>>
>> The problem seems to persist but with a different
signature. Graphs attached as before.
>>
>> Totals with MPICH (NB: single run)
>>
>> For the CG/Jacobi data_exchange_total = 41,385,984;
kspsolve_total = 38,289,408
>> For the GMRES/BJACOBI data_exchange_total = 41,324,544;
kspsolve_total = 41,324,544
>>
>> Just reading the MPI docs I am wondering if I need some
sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the
data exchange routine?
>> I would have thought that with the blocking receives and
the MPI_Barrier that everything will have fully completed
and cleaned up before
>> all processes exited the routine, but perhaps I am wrong
on that.
>
> Skimming the fortran code you sent you do:
>
> for i in ...:
> call MPI_Isend(..., req, ierr)
>
> for i in ...:
> call MPI_Recv(..., ierr)
>
> But you never call MPI_Wait on the request you got back
from the Isend. So the MPI library will never free the data
structures it created.
>
> The usual pattern for these non-blocking communications
is to allocate an array for the requests of length
nsend+nrecv and then do:
>
> for i in nsend:
> call MPI_Isend(..., req[i], ierr)
> for j in nrecv:
> call MPI_Irecv(..., req[nsend+j], ierr)
>
> call MPI_Waitall(req, ..., ierr)
>
> I note also there's no need for the Barrier at the end of
the routine, this kind of communication does neighbourwise
synchronisation, no need to add (unnecessary) global
synchronisation too.
>
> As an aside, is there a reason you don't use PETSc's
VecScatter to manage this global to local exchange?
>
> Cheers,
>
> Lawrence