On Thu, 11 Sep 2014, lee wrote: > "Go down" can have various meanings. When you run a server and a > server process (like an MTA or an IMAP or web server) is killed > because the system runs out of memory, the server is effectively down.
This is why you use things like systemd or similar which are capable of tracking processes and restarting them when they are killed or fail for whatever reason. > It may not be unstable (though I consider a system without an > operational MTA as non-functional), yet you never know what process > will be killed. You're trading having a few processes killed off (often, the very process which is consuming too much memory) with thrashing, and all processes either being just slow (if you're lucky) or so slow that they hit timeouts. If it's thrashing swap that badly, it might as well be down. Worse, when a machine is thrashing that badly, it's often impossible to see what is happening with the machine at all, because even starting a shell (or launching processes) requires swap. All you can do is use magic sysrq and hope that it will give you enough information about what is going on for you to kill something off. > You could have ZFS with fuse, and what prevents such processes from > being killed? You can inform the OOM killer which processes should not be killed fairly trivially. [Things like fuse, sshd, and similar should already be informing the OOM killer that they should not be killed;[1] if not, that's a bug.] 1: For ssh this is already the case. -- Don Armstrong http://www.donarmstrong.com The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair. -- Douglas Adams _Mostly Harmless_ -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140911192827.gg32...@teltox.donarmstrong.com