On Thu, Jul 10, 2025 at 8:46 AM Klaij, Christiaan <c.kl...@marin.nl> wrote:
> Hi Matt, > > Attached is the output of valgrind: > > $ mpirun -mca coll_hcoll_enable 0 -n 2 valgrind --track-origins=yes > ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short > -ksp_gmres_cgs_refinement_type refine_always > out 2>&1 > Hmm, so no MPI error when running with valgrind? It looks like Junchao and I cannot reproduce here. It is puzzling. Would you be able to try MPICH instead? Thanks, Matt > Chris > > > ________________________________________ > From: Matthew Knepley <knep...@gmail.com> > Sent: Thursday, July 10, 2025 1:37 PM > To: Klaij, Christiaan > Cc: Junchao Zhang; PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > On Thu, Jul 10, 2025 at 4:39 AM Klaij, Christiaan via petsc-users < > petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote: > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, > the code does not hang but gives the error below. > > The error on its face should be impossible. On line 289, we pass pointers > to two variables on the stack. This would seem to indicate more general > memory corruption. > > I know we asked before, but have you run under Address Sanitizer or > Valgrind? > > Thanks, > > Matt > > Chris > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi > -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See > https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTn_izD5g$ > < > https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcs91fyp4$> > for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on > login1 by cklaij Thu Jul 10 10:33:33 2025 > [1]PETSC ERROR: Configure options: > --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs > --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 > --with-mpe=0 --with-debugging=0 --download-superlu_dist= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRThmsqdKy$ > < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcEB0dwdE$> > --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 > --download-parmetis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTo_cr9O7$ > < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcW9tvX1c$> > --download-metis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTs3JgUlY$ > < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcI1wRWu4$> > --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild > --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > Proc: [[55228,1],1] > Errorcode: 98 > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > prterun has exited due to process rank 1 with PID 0 on node login1 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ________________________________________ > [cid:ii_197f41eaf2e74966e3f1] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTjNXiH1z$ > < > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcwyIuD3g$ > > > [Facebook]< > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmc0UAPFx4$ > > > [LinkedIn]< > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmc0f6IfnU$ > > > [YouTube]< > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcDphiKcc$ > > > > > From: Klaij, Christiaan <c.kl...@marin.nl<mailto:c.kl...@marin.nl>> > Sent: Thursday, July 10, 2025 10:15 AM > To: Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi Junchao, > > Thanks for testing. I've fixed the error but unfortunately that doesn't > change the behavior, the code still hangs as before, with the same stack > trace... > > Chris > > ________________________________________ > From: Junchao Zhang <junchao.zh...@gmail.com<mailto: > junchao.zh...@gmail.com>> > Sent: Tuesday, July 8, 2025 10:58 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi, Chris, > First, I had to fix an error in your test by adding " > PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > ... > [0]PETSC ERROR: #1 MatSetValues() at > /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > [0]PETSC ERROR: #2 ex2f.F90:258 > > Then I could ran the test without problems > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short > -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran > --download-openmpi --with-ssl=0 --with-shared-libraries=1 > CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" > > Could you fix the error and retry? > > --Junchao Zhang > > > On Sun, Jul 6, 2025 at 12:57 PM Klaij, Christiaan via petsc-users < > petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov><mailto: > petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 <ompi_mpi_2dblprec>, src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 <ompi_mpi_2dblprec>, > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 <ompi_mpi_2dblprec>, > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 <ompi_mpi_2dblprec>, > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 <ompi_mpi_2dblprec>, > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 > <ompi_mpi_2dblprec>, op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30) > at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > [cid:ii_197ebccaa1d27ee6ef21] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTjNXiH1z$ > < > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcO8dj_LY$ > >< > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$ > > > [Facebook]< > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$ > > > [LinkedIn]< > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$ > > > [YouTube]< > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$ > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTiXJnOgy$ > > <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTo-v9INJ$ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTiXJnOgy$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTo-v9INJ$ >