Henning Fehrmann wrote:

> 
> I spread successfully a 10G file to 50 nodes. The rate was 140Mb/s for
nettee and a bit slower using  dolly.
> I guess it was due to a busy node somewhere in the chain.  
> Increasing the number of clients up to 100 failed in both cases.
> 
> For nettee I got:
> nettee: fatal error writing to child: Connection reset by peer

> 
> I will do more systematic test the next days. 
> David Mathog, are you interested in bug reports?

Yes, please. 

If memory serves you will see that error whenever a child node, or
nettee on that child, crashes.  For instance, if you "kill -9" nettee on
a child the parent should see that.  The command option -colwf will let
the chain continue if this is caused by a full disk or a stdout pipe
failing.  The option -conwf should let the chain continue transfer down
to one above the failed node, and it should tell you which node it was
that failed, so long as -v is used with the appropriate bits.

Regards,

David Mathog
[EMAIL PROTECTED]
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to