Did you "fix" the problem with OpenMPI 5, but keep petsc unchanged (ie., still 3.20)?
--Junchao Zhang On Tue, Sep 17, 2024 at 9:47 AM LEDAC Pierre <pierre.le...@cea.fr> wrote: > Thanks Satish, and nice guess for OpenMPI 5 ! > > > It seems it solves the issue (at least on my GPU box where I reproduced > the issue with 8 MPI ranks with OpenMPI 4.x). > > > Unhappily, all the clusters we currently use have no module with OpenMPI > 5.x. Seems I need to build it to really confirm. > > > Probably we will prevent users from configuring our code with OpenMPI-cuda > 4.x cause it is really a weird bug. > > > Pierre LEDAC > Commissariat à l’énergie atomique et aux énergies alternatives > Centre de SACLAY > DES/ISAS/DM2S/SGLS/LCAN > Bâtiment 451 – point courrier n°43 > F-91191 Gif-sur-Yvette > +33 1 69 08 04 03 > +33 6 83 42 05 79 > ------------------------------ > *De :* Satish Balay <balay....@fastmail.org> > *Envoyé :* mardi 17 septembre 2024 15:39:22 > *À :* LEDAC Pierre > *Cc :* Junchao Zhang; petsc-users; ROUMET Elie > *Objet :* Re: [petsc-users] [MPI GPU Aware] KSP_DIVERGED > > On Tue, 17 Sep 2024, LEDAC Pierre wrote: > > > Thanks all, I will try and report. > > > > > > Last question, if I use "-use_gpu_aware_mpi 0" flag with a MPI GPU Aware > library, do PETSc > > > > disable GPU intra/inter communications and send MPI buffers as usual > (with extra Device<->Host copies) ? > > Yes. > > Not: Wrt using MPI that is not GPU-aware - we are changing the default > behavior - to not require "-use_gpu_aware_mpi 0" flag. > > https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7813__;!!G_uCfscf7eWS!ai2MqJBzpM6efcbqDSWP3wTdMXpLjCabfiqf3nJWv7JMm3lBkWsWTyi5UaKoGoNH9DCZwkWaLljMmNPNXQCBk3rS5-fw$ > > > > Satish > > > > > > > Thanks, > > > > > > Pierre LEDAC > > Commissariat à l’énergie atomique et aux énergies alternatives > > Centre de SACLAY > > DES/ISAS/DM2S/SGLS/LCAN > > Bâtiment 451 – point courrier n°43 > > F-91191 Gif-sur-Yvette > > +33 1 69 08 04 03 > > +33 6 83 42 05 79 > > ________________________________ > > De : Satish Balay <balay....@fastmail.org> > > Envoyé : lundi 16 septembre 2024 18:57:02 > > À : Junchao Zhang > > Cc : LEDAC Pierre; petsc-users@mcs.anl.gov; ROUMET Elie > > Objet : Re: [petsc-users] [MPI GPU Aware] KSP_DIVERGED > > > > And/Or - try latest OpenMPI [or MPICH] and see if that makes a > difference. > > > > --download-mpich or --download-openmpi with latest petsc should build > gpu-aware-mpi > > > > Satish > > > > On Mon, 16 Sep 2024, Junchao Zhang wrote: > > > > > Could you try petsc/main to see if the problem persists? > > > > > > --Junchao Zhang > > > > > > > > > On Mon, Sep 16, 2024 at 10:51 AM LEDAC Pierre <pierre.le...@cea.fr> > wrote: > > > > > > > Hi all, > > > > > > > > > > > > We are using PETSc 3.20 in our code and running succesfully several > > > > solvers on Nvidia GPU with OpenMPI library which are not GPU aware > (so I > > > > need to add the flag -use_gpu_aware_mpi 0). > > > > > > > > > > > > But now, when using OpenMPI GPU Aware library (OpenMPI 4.0.5 ou > 4.1.5 from > > > > NVHPC), some parallel calculations failed with *KSP_DIVERGED_ITS* or > > > > *KSP_DIVERGED_DTOL* > > > > > > > > with several configurations. It may run wells on a small test case > with > > > > (matrix is symmetric): > > > > > > > > > > > > *-ksp_type cg -pc_type gamg -pc_gamg_type classical* > > > > > > > > > > > > But suddenly with a number of devices for instance bigger than 4 or > 8, it > > > > may fail. > > > > > > > > > > > > If I switch to another solver (BiCGstab), it may converge: > > > > > > > > > > > > *-ksp_type bcgs -pc_type gamg -pc_gamg_type classical* > > > > > > > > > > > > The more sensitive cases where it diverges are the following: > > > > > > > > > > > > *-ksp_type cg -pc_type hypre -pc_hypre_type boomeramg * > > > > > > > > *-ksp_type cg -pc_type gamg -pc_gamg_type classical* > > > > > > > > > > > > And the *bcgs* turnaroud doesn't work each time... > > > > > > > > > > > > It seems to work without problem with aggregation (at least 128 GPUs > on my > > > > simulation): > > > > > > > > *-ksp_type cg -pc_type gamg -pc_gamg_type agg* > > > > > > > > > > > > So I guess there is a weird thing happening in my code during the > solve in > > > > PETSc with MPI GPU Aware, as all the previous configurations works > with non > > > > GPU aware MPI. > > > > > > > > > > > > Here is the -ksp_view log during one fail with the first > configuration: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *KSP Object: () 8 MPI processes type: cg maximum > iterations=10000, > > > > nonzero initial guess tolerances: relative=0., absolute=0.0001, > > > > divergence=10000. left preconditioning using UNPRECONDITIONED > norm type > > > > for convergence test PC Object: () 8 MPI processes type: hypre > HYPRE > > > > BoomerAMG preconditioning Cycle type V Maximum number of > levels > > > > 25 Maximum number of iterations PER hypre call 1 > Convergence > > > > tolerance PER hypre call 0. Threshold for strong coupling 0.7 > > > > Interpolation truncation factor 0. Interpolation: max elements > per > > > > row 0 Number of levels of aggressive coarsening 0 Number > of > > > > paths for aggressive coarsening 1 Maximum row sums 0.9 > Sweeps > > > > down 1 Sweeps up 1 Sweeps on coarse > 1 > > > > Relax down l1scaled-Jacobi Relax up > > > > l1scaled-Jacobi Relax on coarse Gaussian-elimination > Relax > > > > weight (all) 1. Outer relax weight (all) 1. > Maximum size > > > > of coarsest grid 9 Minimum size of coarsest grid 1 Not > using > > > > CF-relaxation Not using more complex smoothers. Measure > > > > type local Coarsen type PMIS Interpolation > type > > > > ext+i SpGEMM type cusparse linear system matrix = > precond > > > > matrix: Mat Object: () 8 MPI processes type: mpiaijcusparse > > > > rows=64000, cols=64000 total: nonzeros=311040, allocated > > > > nonzeros=311040 total number of mallocs used during MatSetValues > > > > calls=0 not using I-node (on process 0) routines* > > > > > > > > > > > > I didn't succeed for the moment creating a reproducer with ex.c > examples... > > > > > > > > > > > > Did you see this kind of behaviour before? > > > > > > > > Should I update my PETSc version ? > > > > > > > > > > > > Thanks for any advice, > > > > > > > > > > > > Pierre LEDAC > > > > Commissariat à l’énergie atomique et aux énergies alternatives > > > > Centre de SACLAY > > > > DES/ISAS/DM2S/SGLS/LCAN > > > > Bâtiment 451 – point courrier n°43 > > > > F-91191 Gif-sur-Yvette > > > > +33 1 69 08 04 03 > > > > +33 6 83 42 05 79 > > > > > > > > > >