On Thu, Jul 10, 2025 at 4:39 AM Klaij, Christiaan via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0,
> the code does not hang but gives the error below.
>

The error on its face should be impossible. On line 289, we pass pointers
to two variables on the stack. This would seem to indicate more general
memory corruption.

I know we asked before, but have you run under Address Sanitizer or
Valgrind?

  Thanks,

     Matt


> Chris
>
>
> $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi
> -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always
> 0 KSP Residual norm 1.11803
> 1 KSP Residual norm 0.591608
> 2 KSP Residual norm 0.316228
> 3 KSP Residual norm < 1.e-11
> 0 KSP Residual norm 0.707107
> 1 KSP Residual norm 0.408248
> 2 KSP Residual norm < 1.e-11
> Norm of error < 1.e-12 iterations 3
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [1]PETSC ERROR: General MPI error
> [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer
> [1]PETSC ERROR: See 
> https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCm0ya88i$
>  
> <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcs91fyp4$>
> for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025
> [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on
> login1 by cklaij Thu Jul 10 10:33:33 2025
> [1]PETSC ERROR: Configure options:
> --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs
> --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0
> --with-mpe=0 --with-debugging=0 --download-superlu_dist=
> https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCnh-YWju$
>  
> <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcEB0dwdE$>
> --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2
> --download-parmetis=
> https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCkew_rwy$
>  
> <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcW9tvX1c$>
> --download-metis=
> https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCmsCEJyh$
>  
> <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcI1wRWu4$>
> --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild
> --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall
> -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall
> -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall
> -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall
> -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops
> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime
> -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops
> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime
> -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops
> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime
> -Wno-unused-function -O3 -DNDEBUG"
> [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at
> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289
> [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at
> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377
> [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at
> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384
> [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at
> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420
> [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at
> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443
> [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at
> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405
> [1]PETSC ERROR: #7 PetscLogHandlerView() at
> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342
> [1]PETSC ERROR: #8 PetscLogView() at
> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040
> [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
> Proc: [[55228,1],1]
> Errorcode: 98
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> prterun has exited due to process rank 1 with PID 0 on node login1 calling
> "abort". This may have caused other processes in the application to be
> terminated by signals sent by prterun (as reported here).
> --------------------------------------------------------------------------
>
> ________________________________________
> dr. ir.  Christiaan  Klaij  |  senior researcher
> Research & Development  |  CFD Development
> T +31 317 49 33 44 <+31%20317%2049%2033%2044>  |  
> https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCo7OpLen$
>  
> <https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcwyIuD3g$>
> [image: Facebook]
> <https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmc0UAPFx4$>
> [image: LinkedIn]
> <https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmc0f6IfnU$>
> [image: YouTube]
> <https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcDphiKcc$>
>
>
> From: Klaij, Christiaan <c.kl...@marin.nl>
> Sent: Thursday, July 10, 2025 10:15 AM
> To: Junchao Zhang
> Cc: PETSc users list
> Subject: Re: [petsc-users] problem with nested logging, standalone example
>
> Hi Junchao,
>
> Thanks for testing. I've fixed the error but unfortunately that doesn't
> change the behavior, the code still hangs as before, with the same stack
> trace...
>
> Chris
>
> ________________________________________
> From: Junchao Zhang <junchao.zh...@gmail.com>
> Sent: Tuesday, July 8, 2025 10:58 PM
> To: Klaij, Christiaan
> Cc: PETSc users list
> Subject: Re: [petsc-users] problem with nested logging, standalone example
>
> Hi, Chris,
> First, I had to fix an error in your test by adding "
> PetscCallA(MatSetFromOptions(AA,ierr))" at line 254.
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Object is in wrong state
> [0]PETSC ERROR: Mat object's type is not set: Argument # 1
> ...
> [0]PETSC ERROR: #1 MatSetValues() at
> /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503
> [0]PETSC ERROR: #2 ex2f.F90:258
>
> Then I could ran the test without problems
> mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short
> -ksp_gmres_cgs_refinement_type refine_always
> 0 KSP Residual norm 1.11803
> 1 KSP Residual norm 0.591608
> 2 KSP Residual norm 0.316228
> 3 KSP Residual norm < 1.e-11
> 0 KSP Residual norm 0.707107
> 1 KSP Residual norm 0.408248
> 2 KSP Residual norm < 1.e-11
> Norm of error < 1.e-12 iterations 3
>
> I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with
> ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran
> --download-openmpi --with-ssl=0 --with-shared-libraries=1
> CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG"
> CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG "
> COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG"
> CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG "
> FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0
> -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3
> -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0
> -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3
> -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0
> -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3
> -DNDEBUG"
>
> Could you fix the error and retry?
>
> --Junchao Zhang
>
>
> On Sun, Jul 6, 2025 at 12:57 PM Klaij, Christiaan via petsc-users <
> petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
> Attached is a standalone example of the issue described in the
> earlier thread "problem with nested logging". The issue appeared
> somewhere between petsc 3.19.4 and 3.23.4.
>
> The example is a variation of ../ksp/tutorials/ex2f.F90, where
> I've added the nested log viewer with one event as well as the
> solution of a small system on rank zero.
>
> When running on mulitple procs the example hangs during
> PetscLogView with the backtrace below. The configure.log is also
> attached in the hope that you can replicate the issue.
>
> Chris
>
>
> #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1,
> datatype=0x15554c9ef900 <ompi_mpi_2dblprec>, src=1, tag=-12,
> comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700
> #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling (
> sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
> dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
> op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630)
> at base/coll_base_allreduce.c:247
> #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this (
> sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
> dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
> op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630,
> algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142
> #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed (
> sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
> dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
> op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630)
> at coll_tuned_decision_fixed.c:216
> #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20,
> rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
> op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaecb80)
> at coll_hcoll_ops.c:217
> #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20,
> recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900
> <ompi_mpi_2dblprec>, op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30)
> at allreduce.c:123
> #6 0x0000155553eabede in MPIU_Allreduce_Private () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #15 0x0000155553e56232 in PetscLogHandlerView () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #16 0x0000155553e588c3 in PetscLogView () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #17 0x0000155553e40eb5 in petsclogview_ () from
> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
> #18 0x0000000000402c8b in MAIN__ ()
> #19 0x00000000004023df in main ()
> [cid:ii_197ebccaa1d27ee6ef21]
> dr. ir. Christiaan Klaij | senior researcher
> Research & Development | CFD Development
> T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | 
> https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCo7OpLen$
>  
> <https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcO8dj_LY$>
> <
> https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$
> >
> [Facebook]<
> https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$
> >
> [LinkedIn]<
> https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$
> >
> [YouTube]<
> https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$
> >
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCkcm7Yoj$
  
<https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCkC_3zyU$
 >

Reply via email to