I am trying to track down a memory issue with my code; apologies in advance for the longish message.

I am solving a FEA problem with a number of load steps involving about 3000
right hand side and tangent assemblies and solves.  The program is mainly Fortran, with a C memory allocator.

When I run my code in strictly serial mode (no Petsc or MPI routines) the memory stays constant over the whole run.

When I run it in parallel mode with petsc solvers with num_processes=1, the memory (max resident set size) also stays constant:

PetscMalloc = 28,976, ProgramNativeMalloc = constant, Resident Size = 24,854,528 (constant) [CG/JACOBI]

[PetscMalloc and Resident Size as reported by PetscMallocGetCurrentUsage and PetscMemoryGetCurrentUsage (and summed across processes as needed);
ProgramNativeMalloc reported by program memory allocator.]

When I run it in parallel mode with petsc solvers but num_processes=2, the resident memory grows steadily during the run:

PetscMalloc = 3,039,072 (constant), ProgramNativeMalloc = constant, Resident Size = (finish) 31,313,920 (start) 24,698,880 [CG/JACOBI]

When I run it in parallel mode with petsc solvers but num_processes=4, the resident memory grows steadily during the run:

PetscMalloc = 3,307,888 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 70,787,072  (start) 45,801,472 [CG/JACOBI] PetscMalloc = 5,903,808 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 112,410,624 (start) 52,076,544 [GMRES/BJACOBI] PetscMalloc = 3,188,944 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 712,798,208 (start) 381,480,960 [SUPERLU] PetscMalloc = 6,539,408 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 591,048,704 (start) 278,671,360 [MUMPS]

The memory growth feels alarming but maybe I do not understand the values in ru_maxrss from getrusage().

My box (MacBook Pro) has a broken Valgrind so I need to get to a system with a functional one; notwithstanding, the code has always been Valgrind clean. There are no Fortran Pointers or Fortran Allocatable arrays in the part of the code being used.  The program's C memory allocator keeps track of itself so I do not see that the problem is there.  The Petsc malloc is also steady.

Other random hints:

1) If I comment out the call to KSPSolve and to my MPI data-exchange routine (for passing solution values between processes after each solve, use  MPI_Isend, MPI_Recv, MPI_BARRIER)  the memory growth essentially goes away.

2) If I comment out the call to my MPI data-exchange routine but leave the call to KSPSolve the problem remains but is substantially reduced for CG/JACOBI, and is marginally reduced for the GMRES/BJACOBI, SUPERLU, and MUMPS runs.

3) If I comment out the call to KSPSolve but leave the call to my MPI data-exchange routine the problem remains.

Any suggestions/hints of where to look will be great.

-sanjay


Reply via email to