Package: libopenmpi3 Version: 4.1.2~rc1-4 Severity: important Control: affects -1 src:fenics-dolfinx
fenics-dolfinx FTBFS on 32-bit arches, i386, armel, armhf, see https://buildd.debian.org/status/package.php?p=fenics-dolfinx&suite=experimental https://buildd.debian.org/status/fetch.php?pkg=fenics-dolfinx&arch=i386&ver=1%3A0.3.0-3&stamp=1633022115&raw=0 (ignore the 2021-10-02 failed builds using libhdf5-mpi-dev 1.10.7+repack-2, they need libcurl-dev, i.e. libcurl4-openssl-dev, see Bug##995594 ) The symptom in dolfinx build logs is signal number 11 SEGV: Segmentation Violation, probably memory access out of range when running demo_poisson_mpi_2 PETSc suggests running with -start_in_debugger. When I do that on i386 porterbox and run demo_poisson manually with 2 processes, it gives a more detailed backtrace: (experimental_i386-dchroot)barriere$ mpiexec -n 2 ./demo_poisson -start_in_debugger PETSC: Attaching gdb to ./demo_poisson of pid 5638 on display :0.0 on machine barriere PETSC: Attaching gdb to ./demo_poisson of pid 5639 on display :0.0 on machine barriere Unable to start debugger in xterm: No such file or directory Unable to start debugger in xterm: No such file or directory [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) To prevent termination, change the error handler using PetscPushErrorHandler() [barriere:05638] *** Process received signal *** [barriere:05638] Signal: Aborted (6) [barriere:05638] Signal code: (-6) [barriere:05638] [ 0] linux-gate.so.1(__kernel_rt_sigreturn+0x0)[0xf7f32090] [barriere:05638] [ 1] linux-gate.so.1(__kernel_vsyscall+0x9)[0xf7f32069] [barriere:05638] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0xc6)[0xf5f00f36] [barriere:05638] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x125)[0xf5ee9312] [barriere:05638] [ 4] /usr/lib/petscdir/petsc3.14/i386-linux-gnu-real/lib/libpetsc_real.so.3.14(+0x153d26)[0xf653dd26] [barriere:05638] [ 5] /usr/lib/petscdir/petsc3.14/i386-linux-gnu-real/lib/libpetsc_real.so.3.14(PetscError+0xd0)[0xf653a3b0] [barriere:05638] [ 6] /usr/lib/petscdir/petsc3.14/i386-linux-gnu-real/lib/libpetsc_real.so.3.14(PetscSignalHandlerDefault+0x1a0)[0xf653e790] [barriere:05638] [ 7] /usr/lib/petscdir/petsc3.14/i386-linux-gnu-real/lib/libpetsc_real.so.3.14(+0x154979)[0xf653e979] [barriere:05638] [ 8] linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f32080] [barriere:05638] [ 9] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx4mesh16build_dual_graphEP19ompi_communicator_tRKNS_5graph13AdjacencyListIxEEi+0xd7f)[0xf7ead68f] [barriere:05638] [10] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx4mesh21partition_cells_graphEP19ompi_communicator_tiiRKNS_5graph13AdjacencyListIxEENS0_9GhostModeERKSt8functionIFNS4_IiEES2_iS7_ibEE+0x21d)[0xf7ebeb9d] [barriere:05638] [11] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx4mesh21partition_cells_graphEP19ompi_communicator_tiiRKNS_5graph13AdjacencyListIxEENS0_9GhostModeE+0x59)[0xf7ebece9] [barriere:05638] [12] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZNSt17_Function_handlerIFKN7dolfinx5graph13AdjacencyListIiEEP19ompi_communicator_tiiRKNS2_IxEENS0_4mesh9GhostModeEEPFS3_S6_iiS9_SB_EE9_M_invokeERKSt9_Any_dataOS6_OiSK_S9_OSB_+0x35)[0xf7dff8b5] [barriere:05638] [13] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx4mesh11create_meshEP19ompi_communicator_tRKNS_5graph13AdjacencyListIxEERKNS_3fem17CoordinateElementERKN2xt17xtensor_containerINSC_7uvectorIdSaIdEEELj2ELNSC_11layout_typeE1ENSC_22xtensor_expression_tagEEENS0_9GhostModeERKSt8functionIFKNS4_IiEES2_iiS7_SM_EE+0x163)[0xf7e96d63] [barriere:05638] [14] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(+0x10fdab)[0xf7dfedab] [barriere:05638] [15] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx10generation13RectangleMesh6createEP19ompi_communicator_tRKSt5arrayIS4_IdLj3EELj2EES4_IjLj2EENS_4mesh8CellTypeENSA_9GhostModeERKSt8functionIFKNS_5graph13AdjacencyListIiEES3_iiRKNSF_IxEESC_EERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0xb8)[0xf7dff7c8] [barriere:05638] [16] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx10generation13RectangleMesh6createEP19ompi_communicator_tRKSt5arrayIS4_IdLj3EELj2EES4_IjLj2EENS_4mesh8CellTypeENSA_9GhostModeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x5f)[0xf7dff83f] [barriere:05638] [17] ./demo_poisson(+0x19953)[0x565ed953] [barriere:05638] [18] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0x106)[0xf5eeafd6] [barriere:05638] [19] ./demo_poisson(+0x18451)[0x565ec451] [barriere:05638] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [1]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [1]PETSC ERROR: to get more information on the crash. [1]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) To prevent termination, change the error handler using PetscPushErrorHandler() [barriere:05639] *** Process received signal *** [barriere:05639] Signal: Aborted (6) [barriere:05639] Signal code: (-6) [barriere:05639] [ 0] linux-gate.so.1(__kernel_rt_sigreturn+0x0)[0xf7f66090] [barriere:05639] [ 1] linux-gate.so.1(__kernel_vsyscall+0x9)[0xf7f66069] [barriere:05639] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0xc6)[0xf5f34f36] [barriere:05639] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x125)[0xf5f1d312] [barriere:05639] [ 4] /usr/lib/petscdir/petsc3.14/i386-linux-gnu-real/lib/libpetsc_real.so.3.14(+0x153d26)[0xf6571d26] [barriere:05639] [ 5] /usr/lib/petscdir/petsc3.14/i386-linux-gnu-real/lib/libpetsc_real.so.3.14(PetscError+0xd0)[0xf656e3b0] [barriere:05639] [ 6] /usr/lib/petscdir/petsc3.14/i386-linux-gnu-real/lib/libpetsc_real.so.3.14(PetscSignalHandlerDefault+0x1a0)[0xf6572790] [barriere:05639] [ 7] /usr/lib/petscdir/petsc3.14/i386-linux-gnu-real/lib/libpetsc_real.so.3.14(+0x154979)[0xf6572979] [barriere:05639] [ 8] linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f66080] [barriere:05639] [ 9] /usr/lib/i386-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x4cd6)[0xf1354cd6] [barriere:05639] [10] /usr/lib/i386-linux-gnu/libopen-pal.so.40(opal_progress+0x30)[0xf4dcde70] [barriere:05639] [11] /usr/lib/i386-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xbd)[0xf4dd4a5d] [barriere:05639] [12] /usr/lib/i386-linux-gnu/libmpi.so.40(ompi_request_default_wait+0x236)[0xf7a1b2c6] [barriere:05639] [13] /usr/lib/i386-linux-gnu/libmpi.so.40(ompi_coll_base_sendrecv_actual+0xbb)[0xf7a73b2b] [barriere:05639] [14] /usr/lib/i386-linux-gnu/libmpi.so.40(ompi_coll_base_alltoall_intra_pairwise+0xf7)[0xf7a77b67] [barriere:05639] [15] /usr/lib/i386-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so(ompi_coll_tuned_alltoall_intra_do_this+0x11d)[0xf11b28ed] [barriere:05639] [16] /usr/lib/i386-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so(ompi_coll_tuned_alltoall_intra_dec_fixed+0x99)[0xf11adca9] [barriere:05639] [17] /usr/lib/i386-linux-gnu/libmpi.so.40(MPI_Alltoall+0x182)[0xf7a2f6d2] [barriere:05639] [18] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx3MPI10all_to_allIxEENS_5graph13AdjacencyListIT_EEP19ompi_communicator_tRKS5_+0x15d)[0xf7eb95ed] [barriere:05639] [19] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx4mesh16build_dual_graphEP19ompi_communicator_tRKNS_5graph13AdjacencyListIxEEi+0xdd6)[0xf7ee16e6] [barriere:05639] [20] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx4mesh21partition_cells_graphEP19ompi_communicator_tiiRKNS_5graph13AdjacencyListIxEENS0_9GhostModeERKSt8functionIFNS4_IiEES2_iS7_ibEE+0x21d)[0xf7ef2b9d] [barriere:05639] [21] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx4mesh21partition_cells_graphEP19ompi_communicator_tiiRKNS_5graph13AdjacencyListIxEENS0_9GhostModeE+0x59)[0xf7ef2ce9] [barriere:05639] [22] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZNSt17_Function_handlerIFKN7dolfinx5graph13AdjacencyListIiEEP19ompi_communicator_tiiRKNS2_IxEENS0_4mesh9GhostModeEEPFS3_S6_iiS9_SB_EE9_M_invokeERKSt9_Any_dataOS6_OiSK_S9_OSB_+0x35)[0xf7e338b5] [barriere:05639] [23] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx4mesh11create_meshEP19ompi_communicator_tRKNS_5graph13AdjacencyListIxEERKNS_3fem17CoordinateElementERKN2xt17xtensor_containerINSC_7uvectorIdSaIdEEELj2ELNSC_11layout_typeE1ENSC_22xtensor_expression_tagEEENS0_9GhostModeERKSt8functionIFKNS4_IiEES2_iiS7_SM_EE+0x163)[0xf7ecad63] [barriere:05639] [24] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(+0x10fbb2)[0xf7e32bb2] [barriere:05639] [25] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx10generation13RectangleMesh6createEP19ompi_communicator_tRKSt5arrayIS4_IdLj3EELj2EES4_IjLj2EENS_4mesh8CellTypeENSA_9GhostModeERKSt8functionIFKNS_5graph13AdjacencyListIiEES3_iiRKNSF_IxEESC_EERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0xb8)[0xf7e337c8] [barriere:05639] [26] /fenics/fenics-dolfinx-0.3.0/debian/tmp-real/usr/lib/i386-linux-gnu/libdolfinx_real.so.0.3(_ZN7dolfinx10generation13RectangleMesh6createEP19ompi_communicator_tRKSt5arrayIS4_IdLj3EELj2EES4_IjLj2EENS_4mesh8CellTypeENSA_9GhostModeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x5f)[0xf7e3383f] [barriere:05639] [27] ./demo_poisson(+0x19953)[0x5658d953] [barriere:05639] [28] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0x106)[0xf5f1efd6] [barriere:05639] [29] ./demo_poisson(+0x18451)[0x5658c451] [barriere:05639] *** End of error message *** Since there two processes, there are 2 backtraces here. The first is in libdolfinx_real.so and reaches dolfinx::mesh::build_dual_graph which seems to be using ompi_communicator to access dolfinx's graph::AdjacencyList, before the segfault is reached. The second process seems to be accessing MPI::all_to_all for the graph::AdjacencyList and then goes from libdolfinx_real.so to libmpi.so, then libopen-pal.so and finally reaches mca_btl_vader.so before hitting the segfault. So if I'm reading the trace correctly, the segfault is triggered inside mca_btl_vader.so This is kind of Severity:serious (FTBFS), except that fenics-dolfinx is in experimental, and the old version of dolfinx in unstable doesn't crash this way. I'm hoping to push the new version of dolfinx to unstable soon though.