On Thu, Jun 11, 2020 at 11:51 AM Karl Lin <karl.lin...@gmail.com> wrote:
> Hi, there > > We have written a program using Petsc to solve large sparse matrix system. > It has been working fine for a while. Recently we encountered a problem > when the size of the sparse matrix is larger than 10TB. We used several > hundred nodes and 2200 processes. The program always crashes during > MatAssemblyBegin.Upon a closer look, there seems to be something unusual. > We have a little memory check during loading the matrix to keep track of > rss. The printout of rss in the log shows normal increase up to rank 2160, > i.e., if we load in a portion of matrix that is 1GB, after MatSetValues for > that portion, rss will increase roughly about that number. From rank 2161 > onwards, the rss in every rank doesn't increase after matrix loaded. Then > comes MatAssemblyBegin, the program crashed on rank 2160. > > Is there a upper limit on the number of processes Petsc can handle? or is > there a upper limit in terms of the size of the matrix petsc can handle? > Thank you very much for any info. > It sounds like you overflowed int somewhere. We try and check for this, but catching every place is hard. Try reconfiguring with --with-64-bit-indices Thanks, Matt > Regards, > > Karl > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>