Hi, Thank you for your answers,
But, please ignore the content of the 'links' I have posted, I didn't mean to send you those links. I just did google to find a solution for our cluster's problem 'Disconnecting:…'. However, because I couldn't find a proper solution via googling, I posted it to Beowulf, so, I just did copy-paste the sentence 'Disconnecting:…' in my gmail. That's why you can see 'links' in my email. Returning to our problem, the results of 'netstat –i' and '-s' are as follows, respectively. Please note that: a) I use cat 6, b) it is nearly improbable to have electricity noise c) the head-node has two NICs, eth0 is for internal zone, i.e. computing nodes, which is running with no problem. eth1 is for external zone, i.e. to be connected by our users via ssh. This one has disconnecting problem. d) it doesn't seem that there is any SW/router problem. Because in the same network, there is some other machine, which is connected by users via ssh with no problem. ___________________________________________________________________ [EMAIL PROTECTED] ~]# netstat -i*** Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 586745989 0 0 0 598858710 0 0 0 BMRU eth1 1500 0 701868 0 0 0 325542 0 0 0 BMRU lo 16436 0 1959 0 0 0 1959 0 0 0 LRU [EMAIL PROTECTED] ~]# netstat -s*** Ip: 585891011 total packets received 0 forwarded 0 incoming packets discarded 585887228 incoming packets delivered 597668214 requests sent out Icmp: 34 ICMP messages received 21 input ICMP message failed. ICMP input histogram: destination unreachable: 25 timeout in transit: 5 echo requests: 4 601 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 597 echo replies: 4 Tcp: 78 active connections openings 360 passive connection openings 0 failed connection attempts 18 connection resets received 8 connections established 585798178 segments received 597666644 segments send out 16197 segments retransmited 94 bad segments received. 1682 resets sent Udp: 1005 packets received 596 packets to unknown port received. 0 packet receive errors 1019 packets sent TcpExt: 2 resets received for embryonic SYN_RECV sockets 26 packets pruned from receive queue because of socket buffer overrun ArpFilter: 0 60 TCP sockets finished time wait in fast timer 1 packets rejects in established connections because of timestamp 734435 delayed acks sent 127 delayed acks further delayed because of locked socket Quick ack mode was activated 7963 times 724 packets directly queued to recvmsg prequeue. 6030 packets directly received from backlog 164431 packets directly received from prequeue 571897537 packets header predicted 138 packets header predicted and directly queued to user TCPPureAcks: 44870 TCPHPAcks: 458279645 TCPRenoRecovery: 0 TCPSackRecovery: 2875 TCPSACKReneging: 0 TCPFACKReorder: 0 TCPSACKReorder: 0 TCPRenoReorder: 0 TCPTSReorder: 0 TCPFullUndo: 0 TCPPartialUndo: 0 TCPDSACKUndo: 1 TCPLossUndo: 7099 TCPLoss: 626 TCPLostRetransmit: 0 TCPRenoFailures: 0 TCPSackFailures: 1635 TCPLossFailures: 169 TCPFastRetrans: 4294 TCPForwardRetrans: 23 TCPSlowStartRetrans: 1130 TCPTimeouts: 8329 TCPRenoRecoveryFail: 0 TCPSackRecoveryFail: 279 TCPSchedulerFailed: 0 TCPRcvCollapsed: 2731 TCPDSACKOldSent: 8194 TCPDSACKOfoSent: 0 TCPDSACKRecv: 7125 TCPDSACKOfoRecv: 0 TCPAbortOnSyn: 0 TCPAbortOnData: 28 TCPAbortOnClose: 8 TCPAbortOnMemory: 0 TCPAbortOnTimeout: 12 TCPAbortOnLinger: 0 TCPAbortFailed: 0 TCPMemoryPressures: 0 ___________________________________________________________________ -- Best, Ruhollah Moussavi Baygi On 5/29/07, Robert G. Brown <[EMAIL PROTECTED]> wrote:
On Sun, 27 May 2007, Ruhollah Moussavi Baygi wrote: > Hi everybody at Beowulf, > > I have a serious problem with ssh connection to our cluster. Every > hint/help/suggestion, which can help me to solve it, is highly appreciated. > > Most of the time, when users want to connect and run their programs from > their own PCs, the ssh connection failed, especially during transfer files > from/to head-node. Our user's PCs are mainly WindowsXP, so they use packages > like SSH Secure Shell for connection and file transfer, or Putty for > connection and WinSCP for file transfer. > > > The error massage is as follows: > > 'Disconnecting: Corrupted MAC on input' This sounds to me like hardware problems. What does your physical network look like? Is it built with the right cables, within spec, with decent switches? Do you see other evidence of network packet corruption? > < http://www.google.com/history/url?url=http://ubuntuforums.org/showthread.php%3Ft%3D202076&ei=wkJZRsGfHZf-0gTehKXrDQ&sig2=lIzQGYq3zN0Tz2EC8b4dAw&zx=JGkABbsjtaA&ct=w > > > or > > 'Disconnecting: bad packet Yes, sounds like bad hardware. Perhaps your cables aren't cat 5? Perhaps your electrical power has noise? Perhaps your switch(es) are broken or have been taken over by trolls? This sounds like you're failing packet checksum tests or experiencing pretty serious TCP collision problems. What do the network statistics look like on the interfaces in question? rgb > length...< http://www.google.com/search?q=disconnecting:+bad+packet+length+from+windows+to+linux+machine&hl=en >', > followed by a long integer. > > > This problem has practically made our cluster unusable. So, I would be > thankful for any coming advice. > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:[EMAIL PROTECTED]
-- Best, Ruhollah Moussavi Baygi
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf