Re: Can not get a lick of debug information for a slew of queries that are coming back SERVFAIL

2025-04-24 Thread Bagas Sanjaya
On Tue, Apr 22, 2025 at 01:27:11AM -0700, The Gorf wrote:
> bind version:BIND 9.18.33-1~deb12u2-Debian (Extended Support Version) 
> 
> So it's pretty simple. Im getting a ton of SERVFAILs randomly and I can
> neither figure out why nor figure out how to get bind to give me debug
> information about why. The documentation here:
> https://kb.isc.org/docs/aa-01526
> Hurts my head to read and references stuff that I can't figure out where in
> the documentation it is to explain what it is.
> 
> I took the exact document on that page and switched everything to "debug"
> and I have reset the server, rebooted the server, nothing. Here is the
> exact one line bind produces in its logs for the query that is generating
> the SERVFAIL:
> 
> 22-Apr-2025 01:08:17.138 queries: info: client @0x7ffa3cb78168
> 192.168.8.104#47099 (ksc.wiki): query: ksc.wiki IN A + (10.30.160.20)

I get NXDOMAIN instead on my forwarding resolver setup:

```
$ dig ksc.wiki @127.0.0.1

; <<>> DiG 9.20.8 <<>> ksc.wiki @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 25483
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 56f6630daeee41d50100680a37423a9b5387cc8d8e61 (good)
;; QUESTION SECTION:
;ksc.wiki.  IN  A

;; AUTHORITY SECTION:
wiki.   900 IN  SOA a.nic.wiki. 
admin.tldns.godaddy. 1745499640 1800 300 604800 1800

;; Query time: 700 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Thu Apr 24 20:06:10 WIB 2025
;; MSG SIZE  rcvd: 126
```

> 
> Nothing. And here is the query-errors file:
> geoff@NS1:/var/log/named$ cat query-errors | grep -i wiki
> geoff@NS1:/var/log/named$
> zilch.
> 
> So. what am I doing wrong here? Why can't I get any debug info out of this?
> 
> logging {
>  channel default_log {
>   file "/var/log/named/default" versions 3 size 20m;
>   print-time yes;
>   print-category yes;
>   print-severity yes;
>   severity debug;
>  };
>  channel auth_servers_log {
>   file "/var/log/named/auth_servers" versions 100 size 20m;
>   print-time yes;
>   print-category yes;
>   print-severity yes;
>   severity debug;
>  };
>  channel dnssec_log {
>   file "/var/log/named/dnssec" versions 3 size 20m;
>   print-time yes;
>   print-category yes;
>   print-severity yes;
>   severity debug;
>  };
>  channel zone_transfers_log {
>   file "/var/log/named/zone_transfers" versions 3 size 20m;
>   print-time yes;
>   print-category yes;
>   print-severity yes;
>   severity debug;
>  };
>  channel ddns_log {
>   file "/var/log/named/ddns" versions 3 size 20m;
>   print-time yes;
>   print-category yes;
>   print-severity yes;
>   severity debug;
>  };
>  channel client_security_log {
>   file "/var/log/named/client_security" versions 3 size 20m;
>   print-time yes;
>   print-category yes;
>   print-severity yes;
>   severity debug;
>  };
>  channel rate_limiting_log {
>   file "/var/log/named/rate_limiting" versions 3 size 20m;
>   print-time yes;
>   print-category yes;
>   print-severity yes;
>   severity debug;
>  };
>  channel rpz_log {
>   file "/var/log/named/rpz" versions 3 size 20m;
>   print-time yes;
>   print-category yes;
>   print-severity yes;
>   severity debug;
>  };
>  channel dnstap_log {
>   file "/var/log/named/dnstap" versions 3 size 20m;
>   print-time yes;
>   print-category yes;
>   print-severity yes;
>   severity debug;
>  };
> //
> // If you have the category ‘queries’ defined, and you don’t want query
> logging
> // by default, make sure you add option ‘querylog no;’ - then you can toggle
> // query logging on (and off again) using command ‘rndc querylog’
> //
>  channel queries_log {
>   file "/var/log/named/queries" versions 600 size 20m;
>   print-time yes;
>   print-category yes;
>   print-severity yes;
>   severity debug;
>  };
> //
> // This channel is dynamic so that when the debug level is increased using
> // rndc while the server is running, extra information will be logged about
> // failing queries.  Other debug information for other categories will be
> // sent to the channel default_debug (which is also dynamic), but without
> // affecting the regular logging.
> //
>  channel query-errors_log {
>   file "/var/log/named/query-errors" versions 5 size 20m;
>   print-time yes;
>   print-category yes;
>   print-severity yes;
>   severity debug;
>  };
> //
> // This is the default syslog ch

bind sends back server failure when local cache expired ( glue record)

2025-04-24 Thread Florian Schlums

Dear list

I'm running several bind caching resolver based on Ubuntu latest bind 
release 9.18.30.
Configuration is pretty simple. A few public IP prefixes are allowed to 
use these server as recursive resolver.
All other prefixes are no allowed to use them. The setup is up for 
several years and works more or less without problems.


Now I have a case I have no explanation for.
It's about a glue record and expired cache behavior: crane.smokva.net
In some cases "dig @ns2.ggamaur.net crane.smokva.net" gives me a 
SERVFAIL back. This happens when TTL in servers local cache has expired. 
But this answer will appear only once, a second dig gives me the IP.


#dig @ns2.ggamaur.net crane.smokva.net

; <<>> DiG 9.18.30-0ubuntu0.24.04.2-Ubuntu <<>> @ns2.ggamaur.net 
crane.smokva.net

; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9174
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: f81401b79354e29b01006809fd983d7daeae1c6bfada (good)
;; QUESTION SECTION:
;crane.smokva.net.    IN    A

;; ANSWER SECTION:
crane.smokva.net.    26    IN    A    85.10.196.166

;; Query time: 1 msec
;; SERVER: 213.160.40.34#53(ns2.ggamaur.net) (UDP)
;; WHEN: Thu Apr 24 11:00:08 CEST 2025
;; MSG SIZE  rcvd: 89

#dig @ns2.ggamaur.net crane.smokva.net

; <<>> DiG 9.18.30-0ubuntu0.24.04.2-Ubuntu <<>> @ns2.ggamaur.net 
crane.smokva.net

; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 
26109    < Cache expired

;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: d2c192e8c153ff6501006809fdc00ff4b74c1bc6a88a (good)
;; QUESTION SECTION:
;crane.smokva.net.    IN    A

;; Query time: 1 msec
;; SERVER: 213.160.40.34#53(ns2.ggamaur.net) (UDP)
;; WHEN: Thu Apr 24 11:00:48 CEST 2025
;; MSG SIZE  rcvd: 73

#dig @ns2.ggamaur.net crane.smokva.net

; <<>> DiG 9.18.30-0ubuntu0.24.04.2-Ubuntu <<>> @ns2.ggamaur.net 
crane.smokva.net

; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23097
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 7573634154fc104a01006809fdfe3b4159d1878e28be (good)
;; QUESTION SECTION:
;crane.smokva.net.    IN    A

;; ANSWER SECTION:
crane.smokva.net.    300    IN    A    85.10.196.166

;; Query time: 11 msec
;; SERVER: 213.160.40.34#53(ns2.ggamaur.net) (UDP)
;; WHEN: Thu Apr 24 11:01:50 CEST 2025
;; MSG SIZE  rcvd: 89

In detail, wireshark shows me the following when a local cache entry has 
expired.


No.    Time Source  Destination    Protocol 
    Length    Info

# query to local bind server
599    2025-04-24 08:34:32.084611    213.160.41.17 
213.160.41.10        DNS    87 Standard query 0x5a68 A 
crane.smokva.net OPT

# server sends query to rootserver
600    2025-04-24 08:34:32.086197    2a02:5c0:1:11::10 
2001:500:2d::d       DNS    119 Standard query 0xf931 A 
crane.smokva.net OPT
601    2025-04-24 08:34:32.086318    2a02:5c0:1:11::10 
2001:500:2d::d       DNS    119 Standard query 0x7c1b 
 crane.smokva.net OPT

# server sends server failure as an answer to client
602    2025-04-24 08:34:32.086334    213.160.41.10 
213.160.41.17    DNS    87 Standard query response 
0x5a68 Server failure A crane.smokva.net OPT

# answer from rootserver
603    2025-04-24 08:34:32.087883    2001:500:2d::d 
2a02:5c0:1:11::10    DNS    1235 Standard query response 
0x7c1b  crane.smokva.net NS a.gtld-servers.net NS
604    2025-04-24 08:34:32.087883    2001:500:2d::d 
2a02:5c0:1:11::10    DNS    1235 Standard query response 
0xf931 A crane.smokva.net NS a.gtld-servers.net NS

# server queries .net server
605    2025-04-24 08:34:32.089329    2a02:5c0:1:11::10 
2001:503:231d::2:30  DNS    119 Standard query 0x18a7 
 crane.smokva.net OPT
606    2025-04-24 08:34:32.089399    2a02:5c0:1:11::10 
2001:503:231d::2:30  DNS    119 Standard query 0x88f8 A 
crane.smokva.net OPT

# answer from .net server
607    2025-04-24 08:34:32.091282    2001:503:231d::2:30 
2a02:5c0:1:11::10    DNS    494 Standard query response 
0x88f8 A crane.smokva.net NS crane.smokva.net
608    2025-04-24 08:34:32.091283    2001:503:231d::2:30 
2a02:5c0:1:11::10    DNS    494 Standard query response 
0x18a7  crane.smokva.net NS crane.smokva.net

# server queries to crane.smokva.net
609    2025-04-24 08:34:32.091815    213.160.41.10 
85.10.196.166    DNS    99 Standard query 0x1bda A 
crane.smokva.net OPT
610 

Re: Can not get a lick of debug information for a slew of queries that are coming back SERVFAIL

2025-04-24 Thread The Gorf
Currently set at 99 I guess:

geoff@NS1:~$ sudo rndc status | grep debug
debug level: 99

On Tue, Apr 22, 2025 at 9:47 PM Crist Clark 
wrote:

> What debug level is your server running at?
>
> $ rndc status | grep debug
>
> Do you need to raise the debug level?
>
> $ rndc trace 
>
>
> On Tue, Apr 22, 2025 at 1:27 AM The Gorf 
> wrote:
>
>> bind version:BIND 9.18.33-1~deb12u2-Debian (Extended Support Version)
>> 
>>
>> So it's pretty simple. Im getting a ton of SERVFAILs randomly and I can
>> neither figure out why nor figure out how to get bind to give me debug
>> information about why. The documentation here:
>> https://kb.isc.org/docs/aa-01526
>> Hurts my head to read and references stuff that I can't figure out where
>> in the documentation it is to explain what it is.
>>
>> I took the exact document on that page and switched everything to "debug"
>> and I have reset the server, rebooted the server, nothing. Here is the
>> exact one line bind produces in its logs for the query that is generating
>> the SERVFAIL:
>>
>> 22-Apr-2025 01:08:17.138 queries: info: client @0x7ffa3cb78168
>> 192.168.8.104#47099 (ksc.wiki): query: ksc.wiki IN A + (10.30.160.20)
>>
>> Nothing. And here is the query-errors file:
>> geoff@NS1:/var/log/named$ cat query-errors | grep -i wiki
>> geoff@NS1:/var/log/named$
>> zilch.
>>
>> So. what am I doing wrong here? Why can't I get any debug info out of
>> this?
>>
>> logging {
>>  channel default_log {
>>   file "/var/log/named/default" versions 3 size 20m;
>>   print-time yes;
>>   print-category yes;
>>   print-severity yes;
>>   severity debug;
>>  };
>>  channel auth_servers_log {
>>   file "/var/log/named/auth_servers" versions 100 size 20m;
>>   print-time yes;
>>   print-category yes;
>>   print-severity yes;
>>   severity debug;
>>  };
>>  channel dnssec_log {
>>   file "/var/log/named/dnssec" versions 3 size 20m;
>>   print-time yes;
>>   print-category yes;
>>   print-severity yes;
>>   severity debug;
>>  };
>>  channel zone_transfers_log {
>>   file "/var/log/named/zone_transfers" versions 3 size 20m;
>>   print-time yes;
>>   print-category yes;
>>   print-severity yes;
>>   severity debug;
>>  };
>>  channel ddns_log {
>>   file "/var/log/named/ddns" versions 3 size 20m;
>>   print-time yes;
>>   print-category yes;
>>   print-severity yes;
>>   severity debug;
>>  };
>>  channel client_security_log {
>>   file "/var/log/named/client_security" versions 3 size 20m;
>>   print-time yes;
>>   print-category yes;
>>   print-severity yes;
>>   severity debug;
>>  };
>>  channel rate_limiting_log {
>>   file "/var/log/named/rate_limiting" versions 3 size 20m;
>>   print-time yes;
>>   print-category yes;
>>   print-severity yes;
>>   severity debug;
>>  };
>>  channel rpz_log {
>>   file "/var/log/named/rpz" versions 3 size 20m;
>>   print-time yes;
>>   print-category yes;
>>   print-severity yes;
>>   severity debug;
>>  };
>>  channel dnstap_log {
>>   file "/var/log/named/dnstap" versions 3 size 20m;
>>   print-time yes;
>>   print-category yes;
>>   print-severity yes;
>>   severity debug;
>>  };
>> //
>> // If you have the category ‘queries’ defined, and you don’t want query
>> logging
>> // by default, make sure you add option ‘querylog no;’ - then you can
>> toggle
>> // query logging on (and off again) using command ‘rndc querylog’
>> //
>>  channel queries_log {
>>   file "/var/log/named/queries" versions 600 size 20m;
>>   print-time yes;
>>   print-category yes;
>>   print-severity yes;
>>   severity debug;
>>  };
>> //
>> // This channel is dynamic so that when the debug level is increased using
>> // rndc while the server is running, extra information will be logged
>> about
>> // failing queries.  Other debug information for other categories will be
>> // sent to the channel default_debug (which is also dynamic), but without
>> // affecting the regular logging.
>> //
>>  channel query-errors_log {
>>   file "/var/log/named/query-errors" versions 5 size 20m;
>>   print-time yes;
>>   print-category yes;
>>   print-severity yes;
>>   severity debug;
>>  };
>> //
>> // This is the default syslog channel, defined here for clarity.  You
>> don’t
>> // have to use it if you prefer to log to your own channels.
>> // It sends to syslog’s daemon facility, and sends only logged messages
>> // of priority info and higher.
>> // (The options to print time, category and severity are non-default.)
>> //
>>  channel default_syslog {
>>   print-t