On Wed, 2006-05-03 at 15:21 -0400, Joe Landman wrote: > David Simas wrote: > > > Except that it probably won't help with the problem, which I'm > > guessing is caused by a given host attempting more than 1024 > > RSH connections to a given server in less than TCP TIME WAIT > > seconds (minutes, whatever). If the original correspondent > > Actually it handles exactly these cases. The FANOUT variable lets you > indicate the appropriate parallelism for rsh. I believe pdsh is in use > on the big clusters ( > 1024 nodes at the national labs )
Nod. I was pleased to learn of pdsh. FWIW, loop doesn't try to run all n at once either, though this degree of parallelism is controlled with a command line option. > > doesn't want to use SSH for RSH, which would fix things > > True, and you can use ssh with pdsh. Or rsh. With no syntax change to > the end user. > > > SSH isn't restricted to low-numbered ports, he could try to > > re-implement his application in MPI. > > The basic question a few of us have is exactly what is Bruce and team > doing that is causing them to run out of ports. Once we see this, we > can stop guessing and make better/targetted suggestions. Yup, and strace/truss/whatever is your friend for that: http://dcs.nac.uci.edu/~strombrg/debugging-with-syscall-tracers.html ...though based on the message, I'm guessing they are trying to run too many rsh's in parallel, and hence running out of reserved ports. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf