Package: bind9 Version: 1:9.7.3.dfsg-1~squeeze2 Severity: normal Tags: patch
Sometimes when i do a /etc/init.d/bind9 stop or restart, the init scripts loops in a never ending "waiting for pid ???? to die". Looking to the system and logs i can see that named is still running and the logs show this: Jun 16 10:55:04 ns1 named[1203]: error (unexpected RCODE REFUSED) resolving 'smartadvertising.pt/MX/IN': 188.93.231.1#53 Jun 16 10:55:05 ns1 named[1203]: lame server resolving 'fonsecas.pt' (in 'fonsecas.pt'?): 62.193.206.146#53 Jun 16 10:55:05 ns1 named[1203]: error (network unreachable) resolving 'fonsecas.pt/MX/IN': 2a02:2b8:1:406::724:142#53 Jun 16 10:55:05 ns1 named[1203]: error (network unreachable) resolving 'fonsecas.pt/MX/IN': 2a02:2b8:1:406::724:136#53 Jun 16 10:55:05 ns1 named[1203]: received control channel command 'stop -p' Jun 16 10:55:05 ns1 named[1203]: shutting down: flushing changes Jun 16 10:55:05 ns1 named[1203]: stopping command channel on 127.0.0.1#953 Jun 16 10:55:05 ns1 named[1203]: stopping command channel on ::1#953 Jun 16 10:55:05 ns1 named[1203]: no longer listening on ::#53 Jun 16 10:55:05 ns1 named[1203]: no longer listening on 127.0.0.1#53 Jun 16 10:55:05 ns1 named[1203]: no longer listening on 192.168.1.235#53 Jun 16 10:55:05 ns1 named[1203]: lame server resolving 'maxiprojecto.pt' (in 'maxiprojecto.pt'?): 109.71.40.37#53 So the named is asked to stop, it exits from the ports, but never really exits. the strace -p $PID shows: Process 1203 attached - interrupt to quit futex(0xb6c91bd8, FUTEX_WAIT, 1204, NULL^C <unfinished ...> Process 1203 detached My config is nothing unsusual and i have a cloned machine as a secondary/slave DNS that dont show this problem (maybe only applies to master DNS?) This bug might look as unimportant, but this breaks any script that tried to reload the bind9 config and gives a total lost of service. As a workaround, i would say that the init script should wait a MAXWAIT and if the named dont exits until then, kill it with kill -9. This allows the system to recover... The patch is very simple, define a MAXWAIT and count how many times the init script is waiting to kill the daemon: --- bind9.new 2011-06-16 11:41:11.000000000 +0100 +++ bind9 2009-08-17 14:55:27.000000000 +0100 @@ -19,7 +19,6 @@ # Don't modify this line, change or create /etc/default/bind9. OPTIONS="" RESOLVCONF=no -MAXWAIT=60 test -f /etc/default/bind9 && . /etc/default/bind9 @@ -90,14 +89,9 @@ --pidfile ${PIDFILE} -- $OPTIONS fi if [ -n $pid ]; then - i=0 while kill -0 $pid 2>/dev/null; do log_progress_msg "waiting for pid $pid to die" sleep 1 - i=$((i+1)) - if [ $i -gt $MAXWAIT ] ; then - kill -9 $PID - fi done fi log_end_msg 0 -- System Information: Debian Release: 6.0.1 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Kernel: Linux 2.6.32-5-686 (SMP w/1 CPU core) Locale: LANG=pt_PT.UTF-8, LC_CTYPE=pt_PT.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages bind9 depends on: ii adduser 3.112+nmu2 add and remove users and groups ii bind9utils 1:9.7.3.dfsg-1~squeeze2 Utilities for BIND ii debconf [debconf 1.5.36.1 Debian configuration management sy ii libbind9-60 1:9.7.3.dfsg-1~squeeze2 BIND9 Shared Library used by BIND ii libc6 2.11.2-10 Embedded GNU C Library: Shared lib ii libcap2 1:2.19-3 support for getting/setting POSIX. ii libdb4.8 4.8.30-2 Berkeley v4.8 Database Libraries [ ii libdns69 1:9.7.3.dfsg-1~squeeze2 DNS Shared Library used by BIND ii libgssapi-krb5-2 1.8.3+dfsg-4 MIT Kerberos runtime libraries - k ii libisc62 1:9.7.3.dfsg-1~squeeze2 ISC Shared Library used by BIND ii libisccc60 1:9.7.3.dfsg-1~squeeze2 Command Channel Library used by BI ii libisccfg62 1:9.7.3.dfsg-1~squeeze2 Config File Handling Library used ii libldap-2.4-2 2.4.23-7 OpenLDAP libraries ii liblwres60 1:9.7.3.dfsg-1~squeeze2 Lightweight Resolver Library used ii libssl0.9.8 0.9.8o-4squeeze1 SSL shared libraries ii libxml2 2.7.8.dfsg-2+squeeze1 GNOME XML library ii lsb-base 3.2-23.2squeeze1 Linux Standard Base 3.2 init scrip ii net-tools 1.60-23 The NET-3 networking toolkit ii netbase 4.45 Basic TCP/IP networking system bind9 recommends no packages. Versions of packages bind9 suggests: ii bind9-doc 1:9.7.3.dfsg-1~squeeze2 Documentation for BIND ii dnsutils 1:9.7.3.dfsg-1~squeeze2 Clients provided with BIND pn resolvconf <none> (no description available) pn ufw <none> (no description available) -- debconf information excluded
--- bind9.new 2011-06-16 11:41:11.000000000 +0100 +++ bind9 2009-08-17 14:55:27.000000000 +0100 @@ -19,7 +19,6 @@ # Don't modify this line, change or create /etc/default/bind9. OPTIONS="" RESOLVCONF=no -MAXWAIT=60 test -f /etc/default/bind9 && . /etc/default/bind9 @@ -90,14 +89,9 @@ --pidfile ${PIDFILE} -- $OPTIONS fi if [ -n $pid ]; then - i=0 while kill -0 $pid 2>/dev/null; do log_progress_msg "waiting for pid $pid to die" sleep 1 - i=$((i+1)) - if [ $i -gt $MAXWAIT ] ; then - kill -9 $PID - fi done fi log_end_msg 0