Matt is correct on his point 2. And I'll get fresh output to send your way. Stay tuned.
On Wed, Apr 15, 2020, 2:21 PM Junchao Zhang <junchao.zh...@gmail.com> wrote: > I want to know who called MPI_Init(). Petsc or Chombo? > --Junchao Zhang > > > On Wed, Apr 15, 2020 at 4:13 PM Matthew Knepley <knep...@gmail.com> wrote: > >> On Wed, Apr 15, 2020 at 5:10 PM Junchao Zhang <junchao.zh...@gmail.com> >> wrote: >> >>> Was there a petsc error stack? >>> >> >> 1) SNES ex5 is a highly scalable problem. Just give it large enough m >> and n. >> >> 2) Junchao, it looks like MPI_Init() is failing, which I believe comes >> before we install our signal handler to get us the stack. >> >> Thanks, >> >> Matt >> >> >>> --Junchao Zhang >>> >>> >>> On Wed, Apr 15, 2020 at 3:41 PM Mark Adams <mfad...@lbl.gov> wrote: >>> >>>> Whoops, this is actually Cori-KNL. >>>> >>>> On Wed, Apr 15, 2020 at 4:33 PM Mark Adams <mfad...@lbl.gov> wrote: >>>> >>>>> We have a problem when going from 32K to 64K cores on Cori-haswell. >>>>> Does Anyone have any thoughts? >>>>> Thanks, >>>>> Mark >>>>> >>>>> ---------- Forwarded message --------- >>>>> From: David Trebotich <dptrebot...@lbl.gov> >>>>> Date: Wed, Apr 15, 2020 at 4:20 PM >>>>> Subject: Re: petsc on Cori Haswell >>>>> To: Mark Adams <mfad...@lbl.gov> >>>>> >>>>> >>>>> Hey Mark- >>>>> I am running into some issues that I am convinced are from the PETSc >>>>> build. I am able to build and run on up to 32K cores. At 64K I start >>>>> getting stuff like below (looks like two issues: pmi stuff and MPI_Init). >>>>> I >>>>> have been working with Brian Freisen to see if it's a NERSC problem. At >>>>> this point I build without PETSc and then run native gmg in Chombo and >>>>> have >>>>> no problems. The problems only come with building with PETSc, and at >>>>> larger >>>>> concurrencies. The only thing that has changed is that this is a new PETSc >>>>> installation. Perhaps something changed in the PETSc version you built >>>>> from >>>>> previously? Thanks for the help. >>>>> Treb >>>>> >>>>> Mon Apr 13 17:49:45 2020: [PE_101955]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_101958]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_101958]:_pmi_init:_pmi_mmap_init >>>>> returned -1 >>>>> Mon Apr 13 17:49:45 2020: [PE_101979]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_101979]:_pmi_init:_pmi_mmap_init >>>>> returned -1 >>>>> Mon Apr 13 17:49:45 2020: [PE_82712]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=28, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_17868]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_97918]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_17869]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_17869]:_pmi_init:_pmi_mmap_init returned >>>>> -1 >>>>> Mon Apr 13 17:49:45 2020: [PE_110562]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=27, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_110562]:_pmi_init:_pmi_mmap_init >>>>> returned -1 >>>>> Mon Apr 13 17:49:45 2020: [PE_110563]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=27, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_27899]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=38, pes_this_node=64, timeout=180 secs >>>>> [Mon Apr 13 17:49:45 2020] [c7-4c1s6n0] Fatal error in MPI_Init: Other >>>>> MPI error, error stack: >>>>> MPIR_Init_thread(537): >>>>> MPID_Init(246).......: channel initialization failed >>>>> MPID_Init(647).......: PMI2 init failed: 1 >>>>> Attempting to use an MPI routine before initializing MPICH >>>>> [Mon Apr 13 17:49:45 2020] [c7-4c1s6n0] Fatal error in MPI_Init: Other >>>>> MPI error, error stack: >>>>> MPIR_Init_thread(537): >>>>> MPID_Init(246).......: channel initialization failed >>>>> MPID_Init(647).......: PMI2 init failed: 1 >>>>> Attempting to use an MPI routine before initializing MPICH >>>>> Mon Apr 13 17:49:45 2020: [PE_71961]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_71961]:_pmi_init:_pmi_mmap_init returned >>>>> -1 >>>>> Mon Apr 13 17:49:45 2020: [PE_71962]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_64329]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_64335]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_64335]:_pmi_init:_pmi_mmap_init returned >>>>> -1 >>>>> [Mon Apr 13 17:49:45 2020] [c6-1c2s5n2] Fatal error in MPI_Init: Other >>>>> MPI error, error stack: >>>>> MPIR_Init_thread(537): >>>>> MPID_Init(246).......: channel initialization failed >>>>> MPID_Init(647).......: PMI2 init failed: 1 >>>>> Attempting to use an MPI routine before initializing MPICH >>>>> [Mon Apr 13 17:49:45 2020] [c9-4c2s13n2] Fatal error in MPI_Init: >>>>> Other MPI error, error stack: >>>>> MPIR_Init_thread(537): >>>>> MPID_Init(246).......: channel initialization failed >>>>> MPID_Init(647).......: PMI2 init failed: 1 >>>>> Attempting to use an MPI routine before initializing MPICH >>>>> Mon Apr 13 17:49:45 2020: [PE_71960]:_pmi_mmap_tmp: Warning bootstrap >>>>> barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs >>>>> Mon Apr 13 17:49:45 2020: [PE_71960]:_pmi_init:_pmi_mmap_init returned >>>>> -1 >>>>> [Mon Apr 13 17:49:45 2020] [c6-3c2s9n1] Fatal error in MPI_Init: Other >>>>> MPI error, error stack: >>>>> MPIR_Init_thread(537): >>>>> MPID_Init(246).......: channel initialization failed >>>>> >>>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <http://www.cse.buffalo.edu/~knepley/> >> >