shutdown does nothing

Zhang Weiwu Mon, 17 Feb 2014 21:35:22 -0800


On Tue, 18 Feb 2014, Zhang Weiwu wrote:

I have exclude another possibility.

I am thinking:

        1) perhaps the message in /var/log/messages is not produced by init,
        but by reboot/halt/shutdown, and

        2) perhaps init is not invoked at all.
So I run 'init 6' as root. This time, there is no new message in/var/log/messages, prooving 1), and 'init 6' did absolutely nothing,disprooving 2).

I was wrong. init 6 behave differently than reboot/halt/shutdown. It didshutdown a lot of services - my last post was sent a few seconds too early.

Among the services 'init 6' shutdown (which reboot/halt/shutdown failed)are:

- (in /etc/rc.6.d) apache2- (in /etc/rc.6.d) mysql

- (in /etc/rc.6.d) exim4

The services 'init 6' did NOT shutdown are:

- portmap (manual break "/etc/rc6.d/K06portmap stop" worked)
- networking (because I can still establish new ssh connection to this server)
- rsyslogd

I have:

$ ls /etc/rc6.d/
K01apache2                K02mysql         K06portmap     K10lvm2
K01atd                    K03sendsigs      K07hwclock.sh  K11umountroot
K01exim4                  K04rsyslog       K07networking  K12reboot
K01urandom                K05umountnfs.sh  K08ifupdown    README
K01xe-linux-distribution  K06nfs-common    K09umountfs

My suspecision is that K03sendsigs failed, because K02* was terminated, K04*and K06portmap wasn't. K03sendsigs is in between. ps(1) shows sendsigsrunning:


$ ps ax | grep init

1 ? Ss 0:39 init [6] 19299 ? Ss 0:00 /bin/sh/etc/init.d/rc 6

19401 ?        S      0:00 /bin/sh /etc/init.d/sendsigs stop
23319 pts/9    S+     0:00 grep init

So the task is to figure out what sendsigs does and why it hangs.

There is no manual, so I go the hard way to read its source: It does the"Asking all remaining processes to terminate" thing.

So I suppose some daemon refuse to succumb, and sendsigs is waiting for it, orfailed to kill nastily and is thus confused. I look at /var/run:


$ ls -F /var/run/
apache2/      ldapi@           portmap.pid    screen/        sshd.pid
crond.pid     motd             portmap.state  slapd/         utmp
crond.reboot  mysqld/          rpc.statd.pid  sm-notify.pid  xe-daemon.pid
exim4/        portmap_mapping  rsyslogd.pid   sshd/

portmap was manually stopped, therefore, daemons don't always remove pidbefore they leave, and the remaining files in /var/run does not indicatedaemons who refuse to die.

Did sendsigs spit any error message? There were none in /var/log/syslog and/var/log/messages. Another user reported seeing error on screen from sendsigswhile not able to finding it in both log files, so it is not logged there:

I am operating a remote server, there is no screen for me to see.

His problem may be the same as mine. As he solved it, he post:
http://forums.debian.net/viewtopic.php?f=5&t=63896

"A check forced of filesystem solved the problem."

I meditated for a while on this "check forced of filesystem", the grammarisn't correct and the whole sentence makes no sense. Does he mean "reboot -f"to force reboot? I have tried that and didn't make any difference than"reboot" without "-f". Does he mean manually umount all non-root filesystem?My /var/local is the only non-root physical file-system, and it is in use.'sudo lsof /var/local' hangs there for 1 hour, so it remain a mystery whichprocess is using it, but accessing its files is fine and error-free. Besides,there are various *umount* in /etc/rc.6d/ and they are all ordered aftersendsigs, so they are not suposed to cause problem until sendsigs finishes.

So deadend again. Now as I browse through the process tree, I found oneprocess that is started 2 weeks ago and should be long dead:


$ ps ax | grep youtu
18380 ?        D      0:03 python /usr/local/bin/youtube-dl

I distantly remember it had been run on a NFS mount which was jammed, andlater, because umount not possible (NFS server gone), I had done lazy umount:

# umount -l /mnt/nfs

So I believe this one the culprit. "kill -9" cannot kill it, confirming myguess. https://wiki.debian.org/Kill says if you can't kill with "kill -9", youshould reboot, which brings me back to this problem, chicken or egg first?

With no way to kill 18380 but to reboot, and no way to reboot but to kill18380, I instead killed sendsigs with -TERM. The result is trouble: I wasimmediately kicked out of ssh session, server stopped to responding PING, andhalf an hour later I capitulated and called datacenter for a cold reboot.

After the server is online again, I immediately did a reboot and succeeded.So, it is very likely the stall process 18380 that stems reboot/shutdown.


My conclusion so far:

1. If you had an NFS mount, and NFS server is gone, you cannot umount itunless you reboot, which won't be successful and you need to do cold reboot.

2. You can get NFS mount out of sight with lazy umount (umount -l) but theyare still there holding any process that uses it. I waited 2 weeks. It couldbe there forever.


3. If sendsigs cannot kill every process, killing itself doesn't help.



--

To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.orgwith a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/alpine.DEB.2.10.1402181152490.4922@lyonesse

(toughed out) Re: reboot/halt/shutdown does nothing

Reply via email to