Hi,

On 24/12/20 at 17:16 +0100, Michael Banck wrote:
> Package: libopenmpi3
> Version: 3.1.3-11
> Severity: serious
> 
> Even with the fixed libpmix2_4.0.0~rc1-2, I am getting runtime failures
> trying to run MPI programs, e.g. the nwchem autopkgtests all fail like
> this:

A simple way to reproduce is:

$ mpiexec -n 1 true
[groff:16932] [[40958,0],0] ORTE_ERROR_LOG: Not found in file 
../../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line 320
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_pmix_base_select failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------

It happens with those versions:

$ dpkg -l |grep -e openmpi -e pmi
ii  libopenmpi3:amd64             4.1.0-1                      amd64        
high performance message passing library -- shared library
ii  libpmix2:amd64                4.0.0~rc1-2                  amd64        
Process Management Interface (Exascale) library
ii  openmpi-bin                   4.1.0-1                      amd64        
high performance message passing library -- binaries
ii  openmpi-common                4.1.0-1                      all          
high performance message passing library -- common files

It doesn't fail after downgrading openmpi to the version in testing
(4.0.5-7)

Lucas

Reply via email to