Roman Gelfand wrote: > For couple of months, now, I have this postfix smtp gateway on debian > wheezy during which I had no problems with connectivity. Now, after > couple of minutes I get disconnected from putty ssh session. The > issue is not only there. Apache web server self updating cgi site > dies after a while. > > How can I troubleshoot this?
Start by becoming familiar with the /var/log/* log files. Look through them and see if you find anything that gives clues to the problem. Start with these files: /var/log/syslog /var/log/kern.log What you are describing sounds like something not specific to any one program but across all of them. Therefore I suspect one of several possibilities. * bad memory dimm, causing memory errors, causing process death * bad motherboard, causing general errors, causing process death * kernel bug, hitting processes * not enough memory, causing Linux out-of-memory killer to be activated and the oom is killing your active processes * cable problems in your system, disk drive cable causing I/O transfer corruption between storage and system * possibly a failing disk drive * an endless list of other possibilities Those are just ideas. To check systems I will try to look for specific problems. Run 'memtest86' or 'memtest86+' to look for ram problems. Being a hardware guy I will disassemble the machine and re-assemble it. Because connectors tend to be unreliable. Carefully unplugging and plugging back in connectors will scrub them a little bit and can improve a fix connection. I will look to see how much memory is available. I like the 'htop' program for this. It gives a nice bar graph that spacially shows the amount of memory used and where. If there is still a significant amount of memory used for file system buffer cache then life is good. If not then file system buffer cache suffers. But as for a possible problem for you if there isn't enough virtual memory then the Linux kernel will invoke the out-of-memory killer which will start killing off active processes. Ensure that you have enough to avoid the OOM killer. (Or disable it entirely. I have ranted about turning off the OOM killer before.) I would check the disk drive with smartctl. Is it logging errors? Run SMART tests and check the results. SMART isn't a good predictor of failure but sometimes it does confirm failure. Hopefully those ideas help. Good luck! Bob
signature.asc
Description: Digital signature