Hi Matt,

Attached is the output of valgrind:

$ mpirun -mca coll_hcoll_enable 0 -n 2 valgrind --track-origins=yes 
./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short 
-ksp_gmres_cgs_refinement_type refine_always > out 2>&1

Chris


________________________________________
From: Matthew Knepley <knep...@gmail.com>
Sent: Thursday, July 10, 2025 1:37 PM
To: Klaij, Christiaan
Cc: Junchao Zhang; PETSc users list
Subject: Re: [petsc-users] problem with nested logging, standalone example

On Thu, Jul 10, 2025 at 4:39 AM Klaij, Christiaan via petsc-users 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the 
code does not hang but gives the error below.

The error on its face should be impossible. On line 289, we pass pointers to 
two variables on the stack. This would seem to indicate more general memory 
corruption.

I know we asked before, but have you run under Address Sanitizer or Valgrind?

  Thanks,

     Matt

Chris


$ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi 
-ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always
0 KSP Residual norm 1.11803
1 KSP Residual norm 0.591608
2 KSP Residual norm 0.316228
3 KSP Residual norm < 1.e-11
0 KSP Residual norm 0.707107
1 KSP Residual norm 0.408248
2 KSP Residual norm < 1.e-11
Norm of error < 1.e-12 iterations 3
[1]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------
[1]PETSC ERROR: General MPI error
[1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer
[1]PETSC ERROR: See 
https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGSUP46MA$
 
<https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcs91fyp4$>
 for trouble shooting.
[1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025
[1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on 
login1 by cklaij Thu Jul 10 10:33:33 2025
[1]PETSC ERROR: Configure options: 
--prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs 
--with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 
--with-debugging=0 
--download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGUnQa2TU$
 
<https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcEB0dwdE$>
 --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 
--download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGHhVsNGA$
 
<https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcW9tvX1c$>
 
--download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGgrD72NI$
 
<https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcI1wRWu4$>
 --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild 
--with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall 
-funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall 
-funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall 
-funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall 
-funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops 
-ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
-Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops 
-ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
-Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops 
-ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
-Wno-unused-function -O3 -DNDEBUG"
[1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at 
/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289
[1]PETSC ERROR: #2 PetscLogNestedTreePrint() at 
/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377
[1]PETSC ERROR: #3 PetscLogNestedTreePrint() at 
/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384
[1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at 
/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420
[1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at 
/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443
[1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at 
/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405
[1]PETSC ERROR: #7 PetscLogHandlerView() at 
/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342
[1]PETSC ERROR: #8 PetscLogView() at 
/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040
[1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
Proc: [[55228,1],1]
Errorcode: 98

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
prterun has exited due to process rank 1 with PID 0 on node login1 calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------

________________________________________
[cid:ii_197f41eaf2e74966e3f1]
dr. ir.         Christiaan       Klaij   |      senior researcher
Research & Development   |      CFD Development
T +31 317 49 33 44<tel:+31%20317%2049%2033%2044>         |      
https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGPWK5ac8$
 
<https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcwyIuD3g$>
[Facebook]<https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmc0UAPFx4$>
[LinkedIn]<https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmc0f6IfnU$>
[YouTube]<https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcDphiKcc$>


From: Klaij, Christiaan <c.kl...@marin.nl<mailto:c.kl...@marin.nl>>
Sent: Thursday, July 10, 2025 10:15 AM
To: Junchao Zhang
Cc: PETSc users list
Subject: Re: [petsc-users] problem with nested logging, standalone example

Hi Junchao,

Thanks for testing. I've fixed the error but unfortunately that doesn't change 
the behavior, the code still hangs as before, with the same stack trace...

Chris

________________________________________
From: Junchao Zhang <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>>
Sent: Tuesday, July 8, 2025 10:58 PM
To: Klaij, Christiaan
Cc: PETSc users list
Subject: Re: [petsc-users] problem with nested logging, standalone example

Hi, Chris,
First, I had to fix an error in your test by adding " 
PetscCallA(MatSetFromOptions(AA,ierr))" at line 254.
[0]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: Mat object's type is not set: Argument # 1
...
[0]PETSC ERROR: #1 MatSetValues() at 
/scratch/jczhang/petsc/src/mat/interface/matrix.c:1503
[0]PETSC ERROR: #2 ex2f.F90:258

Then I could ran the test without problems
mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short 
-ksp_gmres_cgs_refinement_type refine_always
0 KSP Residual norm 1.11803
1 KSP Residual norm 0.591608
2 KSP Residual norm 0.316228
3 KSP Residual norm < 1.e-11
0 KSP Residual norm 0.707107
1 KSP Residual norm 0.408248
2 KSP Residual norm < 1.e-11
Norm of error < 1.e-12 iterations 3

I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with
./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi 
--with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall 
-funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall 
-funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall 
-funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall 
-funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops 
-ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
-Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops 
-ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
-Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops 
-ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
-Wno-unused-function -O3 -DNDEBUG"

Could you fix the error and retry?

--Junchao Zhang


On Sun, Jul 6, 2025 at 12:57 PM Klaij, Christiaan via petsc-users 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov><mailto:petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>>
 wrote:
Attached is a standalone example of the issue described in the
earlier thread "problem with nested logging". The issue appeared
somewhere between petsc 3.19.4 and 3.23.4.

The example is a variation of ../ksp/tutorials/ex2f.F90, where
I've added the nested log viewer with one event as well as the
solution of a small system on rank zero.

When running on mulitple procs the example hangs during
PetscLogView with the backtrace below. The configure.log is also
attached in the hope that you can replicate the issue.

Chris


#0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1,
datatype=0x15554c9ef900 <ompi_mpi_2dblprec>, src=1, tag=-12,
comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700
#1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling (
sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630)
at base/coll_base_allreduce.c:247
#2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this (
sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630,
algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142
#3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed (
sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630)
at coll_tuned_decision_fixed.c:216
#4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20,
rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaecb80)
at coll_hcoll_ops.c:217
#5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20,
recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 <ompi_mpi_2dblprec>, 
op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30) at allreduce.c:123
#6 0x0000155553eabede in MPIU_Allreduce_Private () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#9 0x0000155553e51f3a in PetscLogNestedTreePrint () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#10 0x0000155553e51e96 in PetscLogNestedTreePrint () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#11 0x0000155553e51e96 in PetscLogNestedTreePrint () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#15 0x0000155553e56232 in PetscLogHandlerView () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#16 0x0000155553e588c3 in PetscLogView () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#17 0x0000155553e40eb5 in petsclogview_ () from 
/home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
#18 0x0000000000402c8b in MAIN__ ()
#19 0x00000000004023df in main ()
[cid:ii_197ebccaa1d27ee6ef21]
dr. ir. Christiaan Klaij | senior researcher
Research & Development | CFD Development
T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | 
https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGPWK5ac8$
 
<https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcO8dj_LY$><https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$>
[Facebook]<https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$>
[LinkedIn]<https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$>
[YouTube]<https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$>



--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGDDw7FtA$
 
<https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGAHJXtEI$
 >

Attachment: out
Description: out

Reply via email to