After recompiling with 64bit option, the program ran successfully. Thank you very much for the insight.
On Thu, Jun 11, 2020 at 12:00 PM Satish Balay <ba...@mcs.anl.gov> wrote: > On Thu, 11 Jun 2020, Karl Lin wrote: > > > Hi, Matthew > > > > Thanks for the suggestion, just did another run and here are some > detailed > > stack traces, maybe will provide some more insight: > > *** Process received signal *** > > Signal: Aborted (6) > > Signal code: (-6) > > /lib64/libpthread.so.0(+0xf5f0)[0x2b56c46dc5f0] > > [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b56c5486337] > > [ 2] /lib64/libc.so.6(abort+0x148)[0x2b56c5487a28] > > [ 3] /libpetsc.so.3.10(PetscTraceBackErrorHandler+0xc4)[0x2b56c1e6a2d4] > > [ 4] /libpetsc.so.3.10(PetscError+0x1b5)[0x2b56c1e69f65] > > [ 5] > /libpetsc.so.3.10(PetscCommBuildTwoSidedFReq+0x19f0)[0x2b56c1e03cf0] > > [ 6] /libpetsc.so.3.10(+0x77db17)[0x2b56c2425b17] > > [ 7] /libpetsc.so.3.10(+0x77a164)[0x2b56c2422164] > > [ 8] /libpetsc.so.3.10(MatAssemblyBegin_MPIAIJ+0x36)[0x2b56c23912b6] > > [ 9] /libpetsc.so.3.10(MatAssemblyBegin+0xca)[0x2b56c1feccda] > > > > By reconfiguring, you mean recompiling petsc with that option, correct? > > yes. you can use a different PETSC_ARCH for this build - so that both > builds are usable [by just switching PETSC_ARCH from your appliation > makefile] > > Satish > > > > > Thank you. > > > > Karl > > > > On Thu, Jun 11, 2020 at 10:56 AM Matthew Knepley <knep...@gmail.com> > wrote: > > > > > On Thu, Jun 11, 2020 at 11:51 AM Karl Lin <karl.lin...@gmail.com> > wrote: > > > > > >> Hi, there > > >> > > >> We have written a program using Petsc to solve large sparse matrix > > >> system. It has been working fine for a while. Recently we encountered > a > > >> problem when the size of the sparse matrix is larger than 10TB. We > used > > >> several hundred nodes and 2200 processes. The program always crashes > during > > >> MatAssemblyBegin.Upon a closer look, there seems to be something > unusual. > > >> We have a little memory check during loading the matrix to keep track > of > > >> rss. The printout of rss in the log shows normal increase up to rank > 2160, > > >> i.e., if we load in a portion of matrix that is 1GB, after > MatSetValues for > > >> that portion, rss will increase roughly about that number. From rank > 2161 > > >> onwards, the rss in every rank doesn't increase after matrix loaded. > Then > > >> comes MatAssemblyBegin, the program crashed on rank 2160. > > >> > > >> Is there a upper limit on the number of processes Petsc can handle? > or is > > >> there a upper limit in terms of the size of the matrix petsc can > handle? > > >> Thank you very much for any info. > > >> > > > > > > It sounds like you overflowed int somewhere. We try and check for this, > > > but catching every place is hard. Try reconfiguring with > > > > > > --with-64-bit-indices > > > > > > Thanks, > > > > > > Matt > > > > > > > > >> Regards, > > >> > > >> Karl > > >> > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > their > > > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > <http://www.cse.buffalo.edu/~knepley/> > > > > > > >