Hi all,

We are using PETSc 3.20 in our code and running succesfully several solvers on 
Nvidia GPU with OpenMPI library which are not GPU aware (so I need to add the 
flag -use_gpu_aware_mpi 0).


But now, when using OpenMPI GPU Aware library (OpenMPI 4.0.5 ou 4.1.5 from 
NVHPC), some parallel calculations failed with KSP_DIVERGED_ITS or 
KSP_DIVERGED_DTOL

with several configurations. It may run wells on a small test case with (matrix 
is symmetric):


-ksp_type cg -pc_type gamg -pc_gamg_type classical


But suddenly with a number of devices for instance bigger than 4 or 8, it may 
fail.


If I switch to another solver (BiCGstab), it may converge:


-ksp_type bcgs -pc_type gamg -pc_gamg_type classical


The more sensitive cases where it diverges are the following:

-ksp_type cg -pc_type hypre -pc_hypre_type boomeramg

-ksp_type cg -pc_type gamg  -pc_gamg_type classical


And the bcgs turnaroud doesn't work each time...


It seems to work without problem with aggregation (at least 128 GPUs on my 
simulation):

-ksp_type cg -pc_type gamg -pc_gamg_type agg


So I guess there is a weird thing happening in my code during the solve in 
PETSc with MPI GPU Aware, as all the previous configurations works with non GPU 
aware MPI.


Here is the -ksp_view log during one fail with the first configuration:


KSP Object: () 8 MPI processes
  type: cg
  maximum iterations=10000, nonzero initial guess
  tolerances:  relative=0., absolute=0.0001, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: () 8 MPI processes
  type: hypre
    HYPRE BoomerAMG preconditioning
      Cycle type V
      Maximum number of levels 25
      Maximum number of iterations PER hypre call 1
      Convergence tolerance PER hypre call 0.
      Threshold for strong coupling 0.7
      Interpolation truncation factor 0.
      Interpolation: max elements per row 0
      Number of levels of aggressive coarsening 0
      Number of paths for aggressive coarsening 1
      Maximum row sums 0.9
      Sweeps down         1
      Sweeps up           1
      Sweeps on coarse    1
      Relax down          l1scaled-Jacobi
      Relax up            l1scaled-Jacobi
      Relax on coarse     Gaussian-elimination
      Relax weight  (all)      1.
      Outer relax weight (all) 1.
      Maximum size of coarsest grid 9
      Minimum size of coarsest grid 1
      Not using CF-relaxation
      Not using more complex smoothers.
      Measure type        local
      Coarsen type        PMIS
      Interpolation type  ext+i
      SpGEMM type         cusparse
  linear system matrix = precond matrix:
  Mat Object: () 8 MPI processes
    type: mpiaijcusparse
    rows=64000, cols=64000
    total: nonzeros=311040, allocated nonzeros=311040
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines


I didn't succeed for the moment creating a reproducer with ex.c examples...


Did you see this kind of behaviour before?

Should I update my PETSc version ?


Thanks for any advice,


Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°43
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79

Reply via email to