Hi Mark, Just to be clear, I do not think it is related to GAMG or PtAP. It is a communication issue:
Reran the same code, and I just got : [252]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [252]PETSC ERROR: Petsc has generated inconsistent data [252]PETSC ERROR: Received vector entry 4469094877509280860 out of local range [255426072,256718616)] [252]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [252]PETSC ERROR: Petsc Release Version 3.13.3, unknown [252]PETSC ERROR: ../../griffin-opt on a arch-moose named r5i4n13 by kongf Mon Jul 20 12:16:47 2020 [252]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices --download-mumps=0 [252]PETSC ERROR: #1 VecAssemblyEnd_MPI_BTS() line 324 in /home/kongf/workhome/sawtooth/moosers/petsc/src/vec/vec/impls/mpi/pbvec.c [252]PETSC ERROR: #2 VecAssemblyEnd() line 171 in /home/kongf/workhome/sawtooth/moosers/petsc/src/vec/vec/interface/vector.c [cli_252]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 252 Thanks, Fande, On Mon, Jul 20, 2020 at 12:24 PM Mark Adams <mfad...@lbl.gov> wrote: > OK, so this is happening in MatProductNumeric_PtAP. This must be in > constructing the coarse grid. > > GAMG sort of wants to coarse at a rate of 30:1 but that needs to be > verified. With that your index is at about the size of the first coarse > grid. I'm trying to figure out if the index is valid. But the size of the > max-index is 740521. This is about what I would guess is the size of the > second coarse grid. > > So it kinda looks like it has a "fine" grid index in the "coarse" grid > (2nd - 3rd coarse grids). > > But Chris is not using GAMG. > > Chris: It sounds like you just have one matrix that you give to MUMPS. You > seem to be creating a matrix in the middle of your run. Are you doing > dynamic adaptivity? > > I think we generate unique tags for each operation but it sounds like > maybe a message is getting mixed up in some way. > > > > On Mon, Jul 20, 2020 at 12:35 PM Fande Kong <fdkong...@gmail.com> wrote: > >> Hi Mark, >> >> Thanks for your reply. >> >> On Mon, Jul 20, 2020 at 7:13 AM Mark Adams <mfad...@lbl.gov> wrote: >> >>> Fande, >>> do you know if your 45226154 was out of range in the real matrix? >>> >> >> I do not know since it was in building the AMG hierarchy. The size of >> the original system is 1,428,284,880 >> >> >>> What size integers do you use? >>> >> >> We are using 64-bit via "--with-64-bit-indices" >> >> >> I am trying to catch the cause of this issue by running more simulations >> with different configurations. >> >> Thanks, >> >> Fande, >> >> >> Thanks, >>> Mark >>> >>> On Mon, Jul 20, 2020 at 1:17 AM Fande Kong <fdkong...@gmail.com> wrote: >>> >>>> Trace could look like this: >>>> >>>> [640]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> >>>> [640]PETSC ERROR: Argument out of range >>>> >>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed >>>> 740521 >>>> >>>> [640]PETSC ERROR: See >>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>> shooting. >>>> >>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>> >>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by >>>> wangy2 Sun Jul 19 17:14:28 2020 >>>> >>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>> --download-mumps=0 >>>> >>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>> >>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>> >>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>> >>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>> >>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901 >>>> in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>> >>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line >>>> 3180 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>> >>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>> >>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>> >>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>> >>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>> >>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>> >>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>> >>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>> >>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>> >>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>> >>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>> >>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>> >>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>> >>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong <fdkong...@gmail.com> >>>> wrote: >>>> >>>>> I am not entirely sure what is happening, but we encountered similar >>>>> issues recently. It was not reproducible. It might occur at different >>>>> stages, and errors could be weird other than "ctable stuff." Our code was >>>>> Valgrind clean since every PR in moose needs to go through rigorous >>>>> Valgrind checks before it reaches the devel branch. The errors happened >>>>> when we used mvapich. >>>>> >>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was >>>>> smooth. May you try a different MPI? It is better to try a system carried >>>>> one. >>>>> >>>>> We did not get the bottom of this problem yet, but we at least know >>>>> this is kind of MPI-related. >>>>> >>>>> Thanks, >>>>> >>>>> Fande, >>>>> >>>>> >>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson <ch...@resfrac.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am having a bug that is occurring in PETSC with the return string: >>>>>> >>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>> largest key allowed 5693 >>>>>> >>>>>> This is using petsc-3.13.2, compiled and running using mpich with -O3 >>>>>> and debugging turned off tuned to the haswell architecture and >>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>> same set of instructions etc.). >>>>>> >>>>>> This is a terrible way to ask a question, I know, and not very >>>>>> helpful from your side, but this is what I have from a user's run and >>>>>> can't >>>>>> reproduce on my end (either with the optimization compilation or with >>>>>> debugging turned on). This happens when the code has run for quite some >>>>>> time and is happening somewhat rarely. >>>>>> >>>>>> More than likely I am using a static variable (code is written in >>>>>> c++) that I'm not updating when the matrix size is changing or something >>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>> >>>>>> *Chris Hewson* >>>>>> Senior Reservoir Simulation Engineer >>>>>> ResFrac >>>>>> +1.587.575.9792 >>>>>> >>>>>