On Mon, Oct 16, 2017 at 7:16 AM, Peter Kjellström <c...@nsc.liu.se> wrote:
> Another is that your MPIs tried to use rdmacm and that in turn tried to
> use ibacm which, if incorrectly setup, times out after ~1m. You can
> verify ibacm functionality by running for example:
>
> user@n1 $ ib_acme -d n2
> ...
> user@n1 $
>
> This should be near instant if ibacm works as it should.

i didn't specifically tell mpi to use one connection setup vs another,
but i'll see if i can track down what openmpi is doing in that regard.

however, your test above fails on my machines

user@n1# ib_acme -d n3
service: localhost
destination: n3
ib_acm_resolve_ip failed: cannot assign requested address
return status 0x0

in the /etc/rdma/ibacme_addr.cfg file i just lists the data specific
to each host, which is gathered by ib_acme -A

truthfully i never configured, i though it just "worked" on it's own,
but perhaps not.  i'll have to google some
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to