Hi Mark, Chris: It sounds like you just have one matrix that you give to MUMPS. You seem to be creating a matrix in the middle of your run. Are you doing dynamic adaptivity? - I have 2 separate matrices I give to mumps, but as this is happening in the production build of my code, I can't determine with certainty what call to MUMPS it's happening or what call to KSPBCGS or UMFPACK it's happening in.
I do destroy and recreate matrices in the middle of my runs, but this happens multiple times before the fault happens and in (presumably) the same way. I also do checks on matrix sizes and what I am sending to PETSc and those all pass, just at some point there are size mismatches somewhere, understandably this is not a lot to go on. I am not doing dynamic adaptivity, the mesh is instead changing its size. And I agree with Fande, the most frustrating part is that it's not reproducible, but yah not 100% sure that the problem lies within the PETSc code base either. Current working theories are: 1. Some sort of MPI problem with the sending of one the matrix elements (using mpich version 3.3a2) 2. Some of the memory of static pointers gets corrupted, although I would expect a garbage number and not something that could possibly make sense. *Chris Hewson* Senior Reservoir Simulation Engineer ResFrac +1.587.575.9792 On Mon, Jul 20, 2020 at 12:41 PM Mark Adams <mfad...@lbl.gov> wrote: > > > On Mon, Jul 20, 2020 at 2:36 PM Fande Kong <fdkong...@gmail.com> wrote: > >> Hi Mark, >> >> Just to be clear, I do not think it is related to GAMG or PtAP. It is a >> communication issue: >> > > Youe stack trace was from PtAP, but Chris's problem is not. > > >> >> Reran the same code, and I just got : >> >> [252]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [252]PETSC ERROR: Petsc has generated inconsistent data >> [252]PETSC ERROR: Received vector entry 4469094877509280860 out of local >> range [255426072,256718616)] >> > > OK, now this (4469094877509280860) is clearly garbage. THat is the > important thing. I have to think your MPI is buggy. > > >