Source: openmpi
Version: 3.1.3-7
Severity: important
Control: affects -1 mpgrafic

Dear Maintainer,

SUMMARY: mpi_init pmix error on GNU/Hurd (gds_dstore.c line 1030)

DESCRIPTION: mpi_init gives a fatal pmix error on GNU/Hurd.
This occurred for a Debian automatic build for mpgrafic-0.3.16-1
on a machine with two processors, and on the Debian porter machine
exodar on a 1-processor machine for a minimal example (provided here).


MINIMAL EXAMPLE:
On exodar:

$ cat openmpi_hurd_bug.f90

program openmpi_hurd_bug
  call MPI_INIT(ierr)
  call MPI_FINALIZE(ierr)
end program openmpi_hurd_bug


$ mpifort --show
gfortran -I/usr/lib/i386-gnu/openmpi/include -pthread 
-I/usr/lib/i386-gnu/openmpi/lib -Wl,--enable-new-dtags 
-L/usr/lib/i386-gnu/openmpi/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr 
-lmpi_mpifh -lmpi

$ mpifort openmpi_hurd_bug # compiles with no warnings or errors.

$ mpirun -n 1 a.out
 [exodar:00753] PMIX ERROR: INIT in file 
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 1030
 [exodar:00753] PMIX ERROR: ERROR in file 
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 2863
 [exodar:00753] PMIX ERROR: UNREACHABLE in file 
../../../../../../src/mca/ptl/tcp/ptl_tcp_component.c at line 1423
 [exodar:00755] PMIX ERROR: UNREACHABLE in file 
../../../../../../src/mca/ptl/tcp/ptl_tcp.c at line 790
 [exodar:00755] OPAL ERROR: Unreachable in file ext2x_client.c at line 109
 *** An error occurred in MPI_Init
 *** on a NULL communicator
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 ***    and potentially your MPI job)
 [exodar:00755] Local abort before MPI_INIT completed completed successfully, 
but am not able to aggregate error messages, and not able to guarantee that all 
other processes were killed!
 --------------------------------------------------------------------------
 Primary job  terminated normally, but 1 process returned
 a non-zero exit code. Per user-direction, the job has been aborted.
 --------------------------------------------------------------------------

C example on exodar:

$ cat openmpi_hurd_bug_C.c

#include <mpi.h>
int main(int argc, char **argv){
  MPI_Init(&argc,&argv);
  MPI_Finalize();
  return 0;
}

$ mpicc --show
gcc -I/usr/lib/i386-gnu/openmpi/include/openmpi 
-I/usr/lib/i386-gnu/openmpi/include -pthread -L/usr/lib/i386-gnu/openmpi/lib 
-lmpi

$ mpicc openmpi_hurd_bug_C.c # no warnings or errors

$ mpirun -n 1 ./a.out
[exodar:00962] PMIX ERROR: INIT in file 
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 1030
[exodar:00962] PMIX ERROR: ERROR in file 
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 2863
[exodar:00962] PMIX ERROR: UNREACHABLE in file 
../../../../../../src/mca/ptl/tcp/ptl_tcp_component.c at line 1423
[exodar:00964] PMIX ERROR: UNREACHABLE in file 
../../../../../../src/mca/ptl/tcp/ptl_tcp.c at line 790
[exodar:00964] OPAL ERROR: Unreachable in file ext2x_client.c at line 109
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[exodar:00964] Local abort before MPI_INIT completed completed successfully, 
but am not able to aggregate error messages, and not able to guarantee that all 
other processes were killed!
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------



CONTEXT of minimal example:

This is on an schroot running sid on exodar: 
https://db.debian.org/machines.cgi?host=exodar


$ sessionid=$(schroot -b -c sid)
$ dd-schroot-cmd -c ${sessionid} apt-get update
$ dd-schroot-cmd -c ${sessionid} apt-get upgrade
$ dd-schroot-cmd -c $sessionid apt-get install mpifort mpi-default-dev mpi-default-bin gfortran $ schroot -e -c ${sessionid}
$ uname -a
 GNU exodar 0.9 GNU-Mach 1.8+git20181103-486-dbg/Hurd-0.9 i686-AT386 GNU

$ cat /proc/hostinfo
Basic info:
max_cpus        =          1    /* max number of cpus possible */
avail_cpus      =          1    /* number of cpus now available */
memory_size     = 3221151744    /* size of memory in bytes */
cpu_type        =         19    /* cpu type */
cpu_subtype     =          1    /* cpu subtype */

$ dpkg -l |egrep "openmpi|pmix|gfortran|gcc|mpifort"
ii  gcc                               4:8.2.0-2                hurd-i386    GNU 
C compiler
ii  gcc-8                             8.2.0-13                 hurd-i386    GNU 
C compiler
ii  gcc-8-base:hurd-i386              8.2.0-13                 hurd-i386    
GCC, the GNU Compiler Collection (base package)
ii  gfortran                          4:8.2.0-2                hurd-i386    GNU 
Fortran 95 compiler
ii  gfortran-8                        8.2.0-13                 hurd-i386    GNU 
Fortran compiler
ii  libgcc-8-dev:hurd-i386            8.2.0-13                 hurd-i386    GCC 
support library (development files)
ii  libgcc1:hurd-i386                 1:8.2.0-13               hurd-i386    GCC 
support library
ii  libgfortran-8-dev:hurd-i386       8.2.0-13                 hurd-i386    
Runtime library for GNU Fortran applications (development files)
ii  libgfortran5:hurd-i386            8.2.0-13                 hurd-i386    
Runtime library for GNU Fortran applications
ii  libopenmpi-dev:hurd-i386          3.1.3-7                  hurd-i386    
high performance message passing library -- header files
ii  libopenmpi3:hurd-i386             3.1.3-7                  hurd-i386    
high performance message passing library -- shared library
ii  libpmix2:hurd-i386                3.0.2-2                  hurd-i386    
Process Management Interface (Exascale) library
ii  openmpi-bin                       3.1.3-7                  hurd-i386    
high performance message passing library -- binaries
ii  openmpi-common                    3.1.3-7                  all          
high performance message passing library -- common files



REPRODUCIBILITY:

(1) Again on exodar, same context, compiled fortran file:

$ mpirun -n 1 --mca plm_rsh_agent /bin/false ./a.out

gives the same error messages, apart from the exodar prompt ([exodar:00808]...).


(2) This bug was originally detected on an mpgrafic build:
https://buildd.debian.org/status/fetch.php?pkg=mpgrafic&arch=hurd-i386&ver=0.3.16-1&stamp=1546275082&raw=0

678  This looks like a debian openmpi system.
679  [ironforge:13033] PMIX ERROR: INIT in file 
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 1030
680  [ironforge:13033] PMIX ERROR: ERROR in file 
../../../../../../src/mca/gds/ds12/gds_dstore.c at line 2863
681  [ironforge:13033] PMIX ERROR: UNREACHABLE in file 
../../../../../../src/mca/ptl/tcp/ptl_tcp_component.c at line 1423
682  [ironforge:13035] PMIX ERROR: UNREACHABLE in file 
../../../../../../src/mca/ptl/tcp/ptl_tcp.c at line 790
683  [ironforge:13035] OPAL ERROR: Unreachable in file ext2x_client.c at line 
109
684  *** An error occurred in MPI_Init
685  *** on a NULL communicator
686  *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
687  ***    and potentially your MPI job)
688  [ironforge:13035] Local abort before MPI_INIT completed completed 
successfully, but am not able to aggregate error messages, and not able to 
guarantee that all other processes were killed!


Cheers
Boud

Reply via email to