Re: Can not get a lick of debug information for a slew of queries that are coming back SERVFAIL
On Tue, Apr 22, 2025 at 01:27:11AM -0700, The Gorf wrote: > bind version:BIND 9.18.33-1~deb12u2-Debian (Extended Support Version) > > So it's pretty simple. Im getting a ton of SERVFAILs randomly and I can > neither figure out why nor figure out how to get bind to give me debug > information about why. The documentation here: > https://kb.isc.org/docs/aa-01526 > Hurts my head to read and references stuff that I can't figure out where in > the documentation it is to explain what it is. > > I took the exact document on that page and switched everything to "debug" > and I have reset the server, rebooted the server, nothing. Here is the > exact one line bind produces in its logs for the query that is generating > the SERVFAIL: > > 22-Apr-2025 01:08:17.138 queries: info: client @0x7ffa3cb78168 > 192.168.8.104#47099 (ksc.wiki): query: ksc.wiki IN A + (10.30.160.20) I get NXDOMAIN instead on my forwarding resolver setup: ``` $ dig ksc.wiki @127.0.0.1 ; <<>> DiG 9.20.8 <<>> ksc.wiki @127.0.0.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 25483 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 56f6630daeee41d50100680a37423a9b5387cc8d8e61 (good) ;; QUESTION SECTION: ;ksc.wiki. IN A ;; AUTHORITY SECTION: wiki. 900 IN SOA a.nic.wiki. admin.tldns.godaddy. 1745499640 1800 300 604800 1800 ;; Query time: 700 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Thu Apr 24 20:06:10 WIB 2025 ;; MSG SIZE rcvd: 126 ``` > > Nothing. And here is the query-errors file: > geoff@NS1:/var/log/named$ cat query-errors | grep -i wiki > geoff@NS1:/var/log/named$ > zilch. > > So. what am I doing wrong here? Why can't I get any debug info out of this? > > logging { > channel default_log { > file "/var/log/named/default" versions 3 size 20m; > print-time yes; > print-category yes; > print-severity yes; > severity debug; > }; > channel auth_servers_log { > file "/var/log/named/auth_servers" versions 100 size 20m; > print-time yes; > print-category yes; > print-severity yes; > severity debug; > }; > channel dnssec_log { > file "/var/log/named/dnssec" versions 3 size 20m; > print-time yes; > print-category yes; > print-severity yes; > severity debug; > }; > channel zone_transfers_log { > file "/var/log/named/zone_transfers" versions 3 size 20m; > print-time yes; > print-category yes; > print-severity yes; > severity debug; > }; > channel ddns_log { > file "/var/log/named/ddns" versions 3 size 20m; > print-time yes; > print-category yes; > print-severity yes; > severity debug; > }; > channel client_security_log { > file "/var/log/named/client_security" versions 3 size 20m; > print-time yes; > print-category yes; > print-severity yes; > severity debug; > }; > channel rate_limiting_log { > file "/var/log/named/rate_limiting" versions 3 size 20m; > print-time yes; > print-category yes; > print-severity yes; > severity debug; > }; > channel rpz_log { > file "/var/log/named/rpz" versions 3 size 20m; > print-time yes; > print-category yes; > print-severity yes; > severity debug; > }; > channel dnstap_log { > file "/var/log/named/dnstap" versions 3 size 20m; > print-time yes; > print-category yes; > print-severity yes; > severity debug; > }; > // > // If you have the category ‘queries’ defined, and you don’t want query > logging > // by default, make sure you add option ‘querylog no;’ - then you can toggle > // query logging on (and off again) using command ‘rndc querylog’ > // > channel queries_log { > file "/var/log/named/queries" versions 600 size 20m; > print-time yes; > print-category yes; > print-severity yes; > severity debug; > }; > // > // This channel is dynamic so that when the debug level is increased using > // rndc while the server is running, extra information will be logged about > // failing queries. Other debug information for other categories will be > // sent to the channel default_debug (which is also dynamic), but without > // affecting the regular logging. > // > channel query-errors_log { > file "/var/log/named/query-errors" versions 5 size 20m; > print-time yes; > print-category yes; > print-severity yes; > severity debug; > }; > // > // This is the default syslog ch
bind sends back server failure when local cache expired ( glue record)
Dear list I'm running several bind caching resolver based on Ubuntu latest bind release 9.18.30. Configuration is pretty simple. A few public IP prefixes are allowed to use these server as recursive resolver. All other prefixes are no allowed to use them. The setup is up for several years and works more or less without problems. Now I have a case I have no explanation for. It's about a glue record and expired cache behavior: crane.smokva.net In some cases "dig @ns2.ggamaur.net crane.smokva.net" gives me a SERVFAIL back. This happens when TTL in servers local cache has expired. But this answer will appear only once, a second dig gives me the IP. #dig @ns2.ggamaur.net crane.smokva.net ; <<>> DiG 9.18.30-0ubuntu0.24.04.2-Ubuntu <<>> @ns2.ggamaur.net crane.smokva.net ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9174 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: f81401b79354e29b01006809fd983d7daeae1c6bfada (good) ;; QUESTION SECTION: ;crane.smokva.net. IN A ;; ANSWER SECTION: crane.smokva.net. 26 IN A 85.10.196.166 ;; Query time: 1 msec ;; SERVER: 213.160.40.34#53(ns2.ggamaur.net) (UDP) ;; WHEN: Thu Apr 24 11:00:08 CEST 2025 ;; MSG SIZE rcvd: 89 #dig @ns2.ggamaur.net crane.smokva.net ; <<>> DiG 9.18.30-0ubuntu0.24.04.2-Ubuntu <<>> @ns2.ggamaur.net crane.smokva.net ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 26109 < Cache expired ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: d2c192e8c153ff6501006809fdc00ff4b74c1bc6a88a (good) ;; QUESTION SECTION: ;crane.smokva.net. IN A ;; Query time: 1 msec ;; SERVER: 213.160.40.34#53(ns2.ggamaur.net) (UDP) ;; WHEN: Thu Apr 24 11:00:48 CEST 2025 ;; MSG SIZE rcvd: 73 #dig @ns2.ggamaur.net crane.smokva.net ; <<>> DiG 9.18.30-0ubuntu0.24.04.2-Ubuntu <<>> @ns2.ggamaur.net crane.smokva.net ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23097 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 7573634154fc104a01006809fdfe3b4159d1878e28be (good) ;; QUESTION SECTION: ;crane.smokva.net. IN A ;; ANSWER SECTION: crane.smokva.net. 300 IN A 85.10.196.166 ;; Query time: 11 msec ;; SERVER: 213.160.40.34#53(ns2.ggamaur.net) (UDP) ;; WHEN: Thu Apr 24 11:01:50 CEST 2025 ;; MSG SIZE rcvd: 89 In detail, wireshark shows me the following when a local cache entry has expired. No. Time Source Destination Protocol Length Info # query to local bind server 599 2025-04-24 08:34:32.084611 213.160.41.17 213.160.41.10 DNS 87 Standard query 0x5a68 A crane.smokva.net OPT # server sends query to rootserver 600 2025-04-24 08:34:32.086197 2a02:5c0:1:11::10 2001:500:2d::d DNS 119 Standard query 0xf931 A crane.smokva.net OPT 601 2025-04-24 08:34:32.086318 2a02:5c0:1:11::10 2001:500:2d::d DNS 119 Standard query 0x7c1b crane.smokva.net OPT # server sends server failure as an answer to client 602 2025-04-24 08:34:32.086334 213.160.41.10 213.160.41.17 DNS 87 Standard query response 0x5a68 Server failure A crane.smokva.net OPT # answer from rootserver 603 2025-04-24 08:34:32.087883 2001:500:2d::d 2a02:5c0:1:11::10 DNS 1235 Standard query response 0x7c1b crane.smokva.net NS a.gtld-servers.net NS 604 2025-04-24 08:34:32.087883 2001:500:2d::d 2a02:5c0:1:11::10 DNS 1235 Standard query response 0xf931 A crane.smokva.net NS a.gtld-servers.net NS # server queries .net server 605 2025-04-24 08:34:32.089329 2a02:5c0:1:11::10 2001:503:231d::2:30 DNS 119 Standard query 0x18a7 crane.smokva.net OPT 606 2025-04-24 08:34:32.089399 2a02:5c0:1:11::10 2001:503:231d::2:30 DNS 119 Standard query 0x88f8 A crane.smokva.net OPT # answer from .net server 607 2025-04-24 08:34:32.091282 2001:503:231d::2:30 2a02:5c0:1:11::10 DNS 494 Standard query response 0x88f8 A crane.smokva.net NS crane.smokva.net 608 2025-04-24 08:34:32.091283 2001:503:231d::2:30 2a02:5c0:1:11::10 DNS 494 Standard query response 0x18a7 crane.smokva.net NS crane.smokva.net # server queries to crane.smokva.net 609 2025-04-24 08:34:32.091815 213.160.41.10 85.10.196.166 DNS 99 Standard query 0x1bda A crane.smokva.net OPT 610
Re: Can not get a lick of debug information for a slew of queries that are coming back SERVFAIL
Currently set at 99 I guess: geoff@NS1:~$ sudo rndc status | grep debug debug level: 99 On Tue, Apr 22, 2025 at 9:47 PM Crist Clark wrote: > What debug level is your server running at? > > $ rndc status | grep debug > > Do you need to raise the debug level? > > $ rndc trace > > > On Tue, Apr 22, 2025 at 1:27 AM The Gorf > wrote: > >> bind version:BIND 9.18.33-1~deb12u2-Debian (Extended Support Version) >> >> >> So it's pretty simple. Im getting a ton of SERVFAILs randomly and I can >> neither figure out why nor figure out how to get bind to give me debug >> information about why. The documentation here: >> https://kb.isc.org/docs/aa-01526 >> Hurts my head to read and references stuff that I can't figure out where >> in the documentation it is to explain what it is. >> >> I took the exact document on that page and switched everything to "debug" >> and I have reset the server, rebooted the server, nothing. Here is the >> exact one line bind produces in its logs for the query that is generating >> the SERVFAIL: >> >> 22-Apr-2025 01:08:17.138 queries: info: client @0x7ffa3cb78168 >> 192.168.8.104#47099 (ksc.wiki): query: ksc.wiki IN A + (10.30.160.20) >> >> Nothing. And here is the query-errors file: >> geoff@NS1:/var/log/named$ cat query-errors | grep -i wiki >> geoff@NS1:/var/log/named$ >> zilch. >> >> So. what am I doing wrong here? Why can't I get any debug info out of >> this? >> >> logging { >> channel default_log { >> file "/var/log/named/default" versions 3 size 20m; >> print-time yes; >> print-category yes; >> print-severity yes; >> severity debug; >> }; >> channel auth_servers_log { >> file "/var/log/named/auth_servers" versions 100 size 20m; >> print-time yes; >> print-category yes; >> print-severity yes; >> severity debug; >> }; >> channel dnssec_log { >> file "/var/log/named/dnssec" versions 3 size 20m; >> print-time yes; >> print-category yes; >> print-severity yes; >> severity debug; >> }; >> channel zone_transfers_log { >> file "/var/log/named/zone_transfers" versions 3 size 20m; >> print-time yes; >> print-category yes; >> print-severity yes; >> severity debug; >> }; >> channel ddns_log { >> file "/var/log/named/ddns" versions 3 size 20m; >> print-time yes; >> print-category yes; >> print-severity yes; >> severity debug; >> }; >> channel client_security_log { >> file "/var/log/named/client_security" versions 3 size 20m; >> print-time yes; >> print-category yes; >> print-severity yes; >> severity debug; >> }; >> channel rate_limiting_log { >> file "/var/log/named/rate_limiting" versions 3 size 20m; >> print-time yes; >> print-category yes; >> print-severity yes; >> severity debug; >> }; >> channel rpz_log { >> file "/var/log/named/rpz" versions 3 size 20m; >> print-time yes; >> print-category yes; >> print-severity yes; >> severity debug; >> }; >> channel dnstap_log { >> file "/var/log/named/dnstap" versions 3 size 20m; >> print-time yes; >> print-category yes; >> print-severity yes; >> severity debug; >> }; >> // >> // If you have the category ‘queries’ defined, and you don’t want query >> logging >> // by default, make sure you add option ‘querylog no;’ - then you can >> toggle >> // query logging on (and off again) using command ‘rndc querylog’ >> // >> channel queries_log { >> file "/var/log/named/queries" versions 600 size 20m; >> print-time yes; >> print-category yes; >> print-severity yes; >> severity debug; >> }; >> // >> // This channel is dynamic so that when the debug level is increased using >> // rndc while the server is running, extra information will be logged >> about >> // failing queries. Other debug information for other categories will be >> // sent to the channel default_debug (which is also dynamic), but without >> // affecting the regular logging. >> // >> channel query-errors_log { >> file "/var/log/named/query-errors" versions 5 size 20m; >> print-time yes; >> print-category yes; >> print-severity yes; >> severity debug; >> }; >> // >> // This is the default syslog channel, defined here for clarity. You >> don’t >> // have to use it if you prefer to log to your own channels. >> // It sends to syslog’s daemon facility, and sends only logged messages >> // of priority info and higher. >> // (The options to print time, category and severity are non-default.) >> // >> channel default_syslog { >> print-t