Hello, We have a strange problem in one of our airgapped environments while we use the same setup in others where we don't have the issue. After some time (varies form seconds to hours), the recursor refuses to give any answer other then host unknown (SRV_FAIL when I remember correctly).
Situation: Airgapped environment with 2 DNS servers, each with: * recursor listening to internal interface * authoritive listening to external interface * DNS lookups trough recursor via external simulated root server to designated authoritives The problem exists within 1 environment where the links to external authoritive servers for root and other domains are slow (1 Mbit or less) and some zones (including root) have very interesting NS records. (NS with hostnames with missing A records) For the root zone this is fixed, but some others still are messy. After a while, the recursor refuses to give ansers to any query, no matter if the DNS server that should answer is configured correctly or not. The only thing that helps in that situation is a restart of the recursor. With log level at max (9) all we see at the moment of the issue is that the recursor answers from packet cache, with no attempts to query externally. The last query in the log is also not remarkable just either works (valid query) or doesn't (invalid query to domains unknown in the environment), no indication of throtteling, timeouts, missed packets or long responce times. When the problem shows up, dig @<recursor ip> fails. However, the moment we use the +trace option, the dig command works around the recursor after the 1st lookup (NS of .) and gets the answer correctly. We can't seem to reproduce the error in the other environments, can't get logging that points to the issue (log level 9 is max?) or even think of a logical reason why this would happen (apart from throtteling). We've set option dont-throttle-netmasks to 0.0.0.0/0 which seems to help a lot, but not solve the problem completely. I'd try to set non-resolving-ns-max-fails to 0 when we were on 4.5. but alas we're stuck at 4.4 at the moment (no way to upgrade the airgapped environment). We need either a way to keep the recursors querying the NS servers to get an answer, or be able to prove which server/environment is the cause of the issue. -- Jan Huijsmans b...@koffie.nu ... cannot activate /dev/brain, no response from main coffee server _______________________________________________ Pdns-users mailing list Pdns-users@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/pdns-users