I cannot reproduce

> On Jul 10, 2025, at 3:46 PM, Junchao Zhang <junchao.zh...@gmail.com> wrote:
> 
> Adding -mca coll_hcoll_enable 0 didn't change anything at my end.  Strange. 
> 
> --Junchao Zhang
> 
> 
> On Thu, Jul 10, 2025 at 3:39 AM Klaij, Christiaan <c.kl...@marin.nl 
> <mailto:c.kl...@marin.nl>> wrote:
>> An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, 
>> the code does not hang but gives the error below.
>> 
>> Chris
>> 
>> 
>> $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi 
>> -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always
>> 0 KSP Residual norm 1.11803
>> 1 KSP Residual norm 0.591608
>> 2 KSP Residual norm 0.316228
>> 3 KSP Residual norm < 1.e-11
>> 0 KSP Residual norm 0.707107
>> 1 KSP Residual norm 0.408248
>> 2 KSP Residual norm < 1.e-11
>> Norm of error < 1.e-12 iterations 3
>> [1]PETSC ERROR: --------------------- Error Message 
>> --------------------------------------------------------------
>> [1]PETSC ERROR: General MPI error
>> [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer
>> [1]PETSC ERROR: See 
>> https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqbE9wCAE$
>>   
>> <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJjkYxsN9$>
>>  for trouble shooting.
>> [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025
>> [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on 
>> login1 by cklaij Thu Jul 10 10:33:33 2025
>> [1]PETSC ERROR: Configure options: 
>> --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs 
>> --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 
>> --with-mpe=0 --with-debugging=0 
>> --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqUpCZ_t4$
>>   
>> <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJkouVHb2$>
>>  --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 
>> --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqz-w4w_E$
>>   
>> <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrjo6-SP$>
>>  
>> --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqcGo9gWg$
>>   
>> <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhCc9MRE$>
>>  
>> --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild
>>  --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall 
>> -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall 
>> -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall 
>> -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall 
>> -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops 
>> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
>> -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops 
>> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
>> -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops 
>> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
>> -Wno-unused-function -O3 -DNDEBUG"
>> [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at 
>> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289
>> [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at 
>> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377
>> [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at 
>> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at 
>> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420
>> [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at 
>> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443
>> [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at 
>> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405
>> [1]PETSC ERROR: #7 PetscLogHandlerView() at 
>> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342
>> [1]PETSC ERROR: #8 PetscLogView() at 
>> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040
>> [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
>> Proc: [[55228,1],1]
>> Errorcode: 98
>> 
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> prterun has exited due to process rank 1 with PID 0 on node login1 calling
>> "abort". This may have caused other processes in the application to be
>> terminated by signals sent by prterun (as reported here).
>> --------------------------------------------------------------------------
>> 
>> ________________________________________
>> <image198746.png>
>> dr. ir.      Christiaan       Klaij   |      senior researcher       
>> Research & Development        |      CFD Development
>> T +31 317 49 33 44 <tel:+31%20317%2049%2033%2044>     |      
>> https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqq6EDt2Q$
>>   
>> <https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrOqapgp$>
>> <image542473.png> 
>> <https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJoD4fuV7$>
>>       
>> <image555176.png> 
>> <https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJospHf95$>
>>  
>> <image269837.png> 
>> <https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrpsjB_W$>
>>     
>> 
>> From: Klaij, Christiaan <c.kl...@marin.nl <mailto:c.kl...@marin.nl>>
>> Sent: Thursday, July 10, 2025 10:15 AM
>> To: Junchao Zhang
>> Cc: PETSc users list
>> Subject: Re: [petsc-users] problem with nested logging, standalone example
>> 
>> Hi Junchao,
>> 
>> Thanks for testing. I've fixed the error but unfortunately that doesn't 
>> change the behavior, the code still hangs as before, with the same stack 
>> trace...
>> 
>> Chris
>> 
>> ________________________________________
>> From: Junchao Zhang <junchao.zh...@gmail.com 
>> <mailto:junchao.zh...@gmail.com>>
>> Sent: Tuesday, July 8, 2025 10:58 PM
>> To: Klaij, Christiaan
>> Cc: PETSc users list
>> Subject: Re: [petsc-users] problem with nested logging, standalone example
>> 
>> Hi, Chris,
>> First, I had to fix an error in your test by adding " 
>> PetscCallA(MatSetFromOptions(AA,ierr))" at line 254.
>> [0]PETSC ERROR: --------------------- Error Message 
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Object is in wrong state
>> [0]PETSC ERROR: Mat object's type is not set: Argument # 1
>> ...
>> [0]PETSC ERROR: #1 MatSetValues() at 
>> /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503
>> [0]PETSC ERROR: #2 ex2f.F90:258
>> 
>> Then I could ran the test without problems
>> mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short 
>> -ksp_gmres_cgs_refinement_type refine_always
>> 0 KSP Residual norm 1.11803
>> 1 KSP Residual norm 0.591608
>> 2 KSP Residual norm 0.316228
>> 3 KSP Residual norm < 1.e-11
>> 0 KSP Residual norm 0.707107
>> 1 KSP Residual norm 0.408248
>> 2 KSP Residual norm < 1.e-11
>> Norm of error < 1.e-12 iterations 3
>> 
>> I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with
>> ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran 
>> --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 
>> -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall 
>> -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall 
>> -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall 
>> -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops 
>> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
>> -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops 
>> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
>> -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops 
>> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime 
>> -Wno-unused-function -O3 -DNDEBUG"
>> 
>> Could you fix the error and retry?
>> 
>> --Junchao Zhang
>> 
>> 
>> On Sun, Jul 6, 2025 at 12:57 PM Klaij, Christiaan via petsc-users 
>> <petsc-users@mcs.anl.gov 
>> <mailto:petsc-users@mcs.anl.gov><mailto:petsc-users@mcs.anl.gov 
>> <mailto:petsc-users@mcs.anl.gov>>> wrote:
>> Attached is a standalone example of the issue described in the
>> earlier thread "problem with nested logging". The issue appeared
>> somewhere between petsc 3.19.4 and 3.23.4.
>> 
>> The example is a variation of ../ksp/tutorials/ex2f.F90, where
>> I've added the nested log viewer with one event as well as the
>> solution of a small system on rank zero.
>> 
>> When running on mulitple procs the example hangs during
>> PetscLogView with the backtrace below. The configure.log is also
>> attached in the hope that you can replicate the issue.
>> 
>> Chris
>> 
>> 
>> #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1,
>> datatype=0x15554c9ef900 <ompi_mpi_2dblprec>, src=1, tag=-12,
>> comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700
>> #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling (
>> sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
>> dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>> op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630)
>> at base/coll_base_allreduce.c:247
>> #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this (
>> sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
>> dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>> op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630,
>> algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142
>> #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed (
>> sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
>> dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>> op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630)
>> at coll_tuned_decision_fixed.c:216
>> #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20,
>> rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>> op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaecb80)
>> at coll_hcoll_ops.c:217
>> #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20,
>> recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 
>> <ompi_mpi_2dblprec>, op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30) 
>> at allreduce.c:123
>> #6 0x0000155553eabede in MPIU_Allreduce_Private () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #15 0x0000155553e56232 in PetscLogHandlerView () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #16 0x0000155553e588c3 in PetscLogView () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #17 0x0000155553e40eb5 in petsclogview_ () from 
>> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> #18 0x0000000000402c8b in MAIN__ ()
>> #19 0x00000000004023df in main ()
>> [cid:ii_197ebccaa1d27ee6ef21]
>> dr. ir. Christiaan Klaij | senior researcher
>> Research & Development | CFD Development
>> T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | 
>> https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqq6EDt2Q$
>>   
>> <https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhphmV4x$><https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$>
>> [Facebook]<https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$>
>> [LinkedIn]<https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$>
>> [YouTube]<https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$>
>> 

Reply via email to