Hi Matt, Attached is the output of valgrind:
$ mpirun -mca coll_hcoll_enable 0 -n 2 valgrind --track-origins=yes ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > out 2>&1 Chris ________________________________________ From: Matthew Knepley <knep...@gmail.com> Sent: Thursday, July 10, 2025 1:37 PM To: Klaij, Christiaan Cc: Junchao Zhang; PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example On Thu, Jul 10, 2025 at 4:39 AM Klaij, Christiaan via petsc-users <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote: An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. The error on its face should be impossible. On line 289, we pass pointers to two variables on the stack. This would seem to indicate more general memory corruption. I know we asked before, but have you run under Address Sanitizer or Valgrind? Thanks, Matt Chris $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: General MPI error [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGSUP46MA$ <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcs91fyp4$> for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGUnQa2TU$ <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcEB0dwdE$> --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGHhVsNGA$ <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcW9tvX1c$> --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGgrD72NI$ <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcI1wRWu4$> --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF Proc: [[55228,1],1] Errorcode: 98 NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- prterun has exited due to process rank 1 with PID 0 on node login1 calling "abort". This may have caused other processes in the application to be terminated by signals sent by prterun (as reported here). -------------------------------------------------------------------------- ________________________________________ [cid:ii_197f41eaf2e74966e3f1] dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGPWK5ac8$ <https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcwyIuD3g$> [Facebook]<https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmc0UAPFx4$> [LinkedIn]<https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmc0f6IfnU$> [YouTube]<https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcDphiKcc$> From: Klaij, Christiaan <c.kl...@marin.nl<mailto:c.kl...@marin.nl>> Sent: Thursday, July 10, 2025 10:15 AM To: Junchao Zhang Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example Hi Junchao, Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... Chris ________________________________________ From: Junchao Zhang <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>> Sent: Tuesday, July 8, 2025 10:58 PM To: Klaij, Christiaan Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example Hi, Chris, First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Mat object's type is not set: Argument # 1 ... [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 [0]PETSC ERROR: #2 ex2f.F90:258 Then I could ran the test without problems mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" Could you fix the error and retry? --Junchao Zhang On Sun, Jul 6, 2025 at 12:57 PM Klaij, Christiaan via petsc-users <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov><mailto:petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>> wrote: Attached is a standalone example of the issue described in the earlier thread "problem with nested logging". The issue appeared somewhere between petsc 3.19.4 and 3.23.4. The example is a variation of ../ksp/tutorials/ex2f.F90, where I've added the nested log viewer with one event as well as the solution of a small system on rank zero. When running on mulitple procs the example hangs during PetscLogView with the backtrace below. The configure.log is also attached in the hope that you can replicate the issue. Chris #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 <ompi_mpi_2dblprec>, src=1, tag=-12, comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 <ompi_mpi_2dblprec>, op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630) at base/coll_base_allreduce.c:247 #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 <ompi_mpi_2dblprec>, op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630, algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 <ompi_mpi_2dblprec>, op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630) at coll_tuned_decision_fixed.c:216 #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 <ompi_mpi_2dblprec>, op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaecb80) at coll_hcoll_ops.c:217 #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 <ompi_mpi_2dblprec>, op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30) at allreduce.c:123 #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #18 0x0000000000402c8b in MAIN__ () #19 0x00000000004023df in main () [cid:ii_197ebccaa1d27ee6ef21] dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGPWK5ac8$ <https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcO8dj_LY$><https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$> [Facebook]<https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$> [LinkedIn]<https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$> [YouTube]<https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGDDw7FtA$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGAHJXtEI$ >
out
Description: out