Source: openmpi
Version: 3.1.3-7
Severity: important
Control: affects -1 mpgrafic
Dear Maintainer,
SUMMARY: mpi_init pmix error on GNU/Hurd (gds_dstore.c line 1030)
DESCRIPTION: mpi_init gives a fatal pmix error on GNU/Hurd.
This occurred for a Debian automatic build for mpgrafic-0.3.16-1
on a machine with two processors, and on the Debian porter machine
exodar on a 1-processor machine for a minimal example (provided here).
MINIMAL EXAMPLE:
On exodar:
$ cat openmpi_hurd_bug.f90
program openmpi_hurd_bug
call MPI_INIT(ierr)
call MPI_FINALIZE(ierr)
end program openmpi_hurd_bug
$ mpifort --show
gfortran -I/usr/lib/i386-gnu/openmpi/include -pthread
-I/usr/lib/i386-gnu/openmpi/lib -Wl,--enable-new-dtags
-L/usr/lib/i386-gnu/openmpi/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr
-lmpi_mpifh -lmpi
$ mpifort openmpi_hurd_bug # compiles with no warnings or errors.
$ mpirun -n 1 a.out
[exodar:00753] PMIX ERROR: INIT in file
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 1030
[exodar:00753] PMIX ERROR: ERROR in file
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 2863
[exodar:00753] PMIX ERROR: UNREACHABLE in file
../../../../../../src/mca/ptl/tcp/ptl_tcp_component.c at line 1423
[exodar:00755] PMIX ERROR: UNREACHABLE in file
../../../../../../src/mca/ptl/tcp/ptl_tcp.c at line 790
[exodar:00755] OPAL ERROR: Unreachable in file ext2x_client.c at line 109
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[exodar:00755] Local abort before MPI_INIT completed completed successfully,
but am not able to aggregate error messages, and not able to guarantee that all
other processes were killed!
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
C example on exodar:
$ cat openmpi_hurd_bug_C.c
#include <mpi.h>
int main(int argc, char **argv){
MPI_Init(&argc,&argv);
MPI_Finalize();
return 0;
}
$ mpicc --show
gcc -I/usr/lib/i386-gnu/openmpi/include/openmpi
-I/usr/lib/i386-gnu/openmpi/include -pthread -L/usr/lib/i386-gnu/openmpi/lib
-lmpi
$ mpicc openmpi_hurd_bug_C.c # no warnings or errors
$ mpirun -n 1 ./a.out
[exodar:00962] PMIX ERROR: INIT in file
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 1030
[exodar:00962] PMIX ERROR: ERROR in file
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 2863
[exodar:00962] PMIX ERROR: UNREACHABLE in file
../../../../../../src/mca/ptl/tcp/ptl_tcp_component.c at line 1423
[exodar:00964] PMIX ERROR: UNREACHABLE in file
../../../../../../src/mca/ptl/tcp/ptl_tcp.c at line 790
[exodar:00964] OPAL ERROR: Unreachable in file ext2x_client.c at line 109
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[exodar:00964] Local abort before MPI_INIT completed completed successfully,
but am not able to aggregate error messages, and not able to guarantee that all
other processes were killed!
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
CONTEXT of minimal example:
This is on an schroot running sid on exodar:
https://db.debian.org/machines.cgi?host=exodar
$ sessionid=$(schroot -b -c sid)
$ dd-schroot-cmd -c ${sessionid} apt-get update
$ dd-schroot-cmd -c ${sessionid} apt-get upgrade
$ dd-schroot-cmd -c $sessionid apt-get install mpifort mpi-default-dev mpi-default-bin gfortran
$ schroot -e -c ${sessionid}
$ uname -a
GNU exodar 0.9 GNU-Mach 1.8+git20181103-486-dbg/Hurd-0.9 i686-AT386 GNU
$ cat /proc/hostinfo
Basic info:
max_cpus = 1 /* max number of cpus possible */
avail_cpus = 1 /* number of cpus now available */
memory_size = 3221151744 /* size of memory in bytes */
cpu_type = 19 /* cpu type */
cpu_subtype = 1 /* cpu subtype */
$ dpkg -l |egrep "openmpi|pmix|gfortran|gcc|mpifort"
ii gcc 4:8.2.0-2 hurd-i386 GNU
C compiler
ii gcc-8 8.2.0-13 hurd-i386 GNU
C compiler
ii gcc-8-base:hurd-i386 8.2.0-13 hurd-i386
GCC, the GNU Compiler Collection (base package)
ii gfortran 4:8.2.0-2 hurd-i386 GNU
Fortran 95 compiler
ii gfortran-8 8.2.0-13 hurd-i386 GNU
Fortran compiler
ii libgcc-8-dev:hurd-i386 8.2.0-13 hurd-i386 GCC
support library (development files)
ii libgcc1:hurd-i386 1:8.2.0-13 hurd-i386 GCC
support library
ii libgfortran-8-dev:hurd-i386 8.2.0-13 hurd-i386
Runtime library for GNU Fortran applications (development files)
ii libgfortran5:hurd-i386 8.2.0-13 hurd-i386
Runtime library for GNU Fortran applications
ii libopenmpi-dev:hurd-i386 3.1.3-7 hurd-i386
high performance message passing library -- header files
ii libopenmpi3:hurd-i386 3.1.3-7 hurd-i386
high performance message passing library -- shared library
ii libpmix2:hurd-i386 3.0.2-2 hurd-i386
Process Management Interface (Exascale) library
ii openmpi-bin 3.1.3-7 hurd-i386
high performance message passing library -- binaries
ii openmpi-common 3.1.3-7 all
high performance message passing library -- common files
REPRODUCIBILITY:
(1) Again on exodar, same context, compiled fortran file:
$ mpirun -n 1 --mca plm_rsh_agent /bin/false ./a.out
gives the same error messages, apart from the exodar prompt ([exodar:00808]...).
(2) This bug was originally detected on an mpgrafic build:
https://buildd.debian.org/status/fetch.php?pkg=mpgrafic&arch=hurd-i386&ver=0.3.16-1&stamp=1546275082&raw=0
678 This looks like a debian openmpi system.
679 [ironforge:13033] PMIX ERROR: INIT in file
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 1030
680 [ironforge:13033] PMIX ERROR: ERROR in file
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 2863
681 [ironforge:13033] PMIX ERROR: UNREACHABLE in file
../../../../../../src/mca/ptl/tcp/ptl_tcp_component.c at line 1423
682 [ironforge:13035] PMIX ERROR: UNREACHABLE in file
../../../../../../src/mca/ptl/tcp/ptl_tcp.c at line 790
683 [ironforge:13035] OPAL ERROR: Unreachable in file ext2x_client.c at line
109
684 *** An error occurred in MPI_Init
685 *** on a NULL communicator
686 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
687 *** and potentially your MPI job)
688 [ironforge:13035] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
Cheers
Boud