On Wed, 2006-05-03 at 15:21 -0400, Joe Landman wrote:
> David Simas wrote:
> 
> > Except that it probably won't help with the problem, which I'm
> > guessing is caused by a given host attempting more than 1024
> > RSH connections to a given server in less than TCP TIME WAIT
> > seconds (minutes, whatever).  If the original correspondent
> 
> Actually it handles exactly these cases.  The FANOUT variable lets you 
> indicate the appropriate parallelism for rsh.  I believe pdsh is in use 
> on the big clusters ( > 1024 nodes at the national labs )

Nod.  I was pleased to learn of pdsh.  FWIW, loop doesn't try to run all
n at once either, though this degree of parallelism is controlled with a
command line option.

> > doesn't want to use SSH for RSH, which would fix things 
> 
> True, and you can use ssh with pdsh.  Or rsh.  With no syntax change to 
> the end user.
> 
> > SSH isn't restricted to low-numbered ports, he could try to
> > re-implement his application in MPI.
> 
> The basic question a few of us have is exactly what is Bruce and team 
> doing that is causing them to run out of ports.  Once we see this, we 
> can stop guessing and make better/targetted suggestions.

Yup, and strace/truss/whatever is your friend for that:

http://dcs.nac.uci.edu/~strombrg/debugging-with-syscall-tracers.html

...though based on the message, I'm guessing they are trying to run too
many rsh's in parallel, and hence running out of reserved ports.



_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to