Hi,

I have made a couple of additional tests.

I have run
 mpirun --mca btl_tcp_if_include lo --mca btl tcp,self -np 2 \
        --mca plm_rsh_agent /bin/true slave
under a variety of environments. I get a surprising result: it works in
a sid chroot in some circumstances.

Results:

- plain jessie, networking on:                   PASS
- plain jessie, networking off:                  PASS
- jessie VirtualBox, networking on:              PASS
- jessie VirtualBox, networking off:             PASS
- jessie chroot (jessie host), networking on:    PASS
- jessie chroot (jessie host), networking off:   PASS

- unstable chroot (jessie host), networking on:  PASS
- unstable chroot (jessie host), networking off: FAIL
- unstable VirtualBox:                           FAIL

I attach the error log from the attempt "unstable chroot (jessie host),
networking off".


Kind regards, Thibaut.

Le 25/11/2016 à 08:51, Thibaut Paumard a écrit :
> Control: retitle 845594 "openmpi: lo interface broken in the tcp btl"
> 
> Hi,
> 
> Actually the regression can also be demonstrated without using
> MPI_Comm_spawn with:
>  mpirun -np 2 --mca btl_tcp_if_include lo --mca btl tcp,self ./slave
> 
> The above command runs fine under jessie (openmpi 1.6.5-9.1) but fails
> under sid.
> 
> For the record my test environment is a production machine for jessie
> and a virtualbox virtual machine for sid.
> 
> Kind regards, Thibaut.
> 

$ mpirun --mca btl_tcp_if_include lo --mca btl tcp,self -np 2 --mca plm_rsh_agent /bin/true slave
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[62385,1],0]) is on host: tantive-iv
  Process 2 ([[62385,1],1]) is on host: tantive-iv
  BTLs attempted: tcp self

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
[tantive-iv:03081] 1 more process has sent help message help-mca-bml-r2.txt / unreachable proc
[tantive-iv:03081] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[tantive-iv:03081] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail

Reply via email to