I filed two issues for Blackbox on Github, one for exposing at least the 'tc' flag state as a metric and one for allowing you to have Blackbox set an EDNS increased size (which is supported by the underlying Go DNS library Blackbox uses). I didn't file an issue for UDP to TCP fallback because I suspect that this is out of scope for Blackbox and anyway it raises design questions of, for example, how the metrics should work (since on a fallback Blackbox is now making two DNS requests).
For any interested parties, these are: https://github.com/prometheus/blackbox_exporter/issues/1258 https://github.com/prometheus/blackbox_exporter/issues/1259 - cks > Thanks for the detailed post. Sounds like a feature request/bug report. I > would file an issue on GitHub, this should be easily solved. > > https://github.com/prometheus/blackbox_exporter/issues > > On Wed, Jun 26, 2024 at 12:19 AM Chris Siebenmann < > [email protected]> wrote: > > > To make a long story short, we've been having mysterious probe failures > > with one of our Blackbox DNS probes against (only) some DNS servers that > > turned out to be because Blackbox UDP DNS probes have a 512-byte limit > > on the size of the reply, because Blackbox doesn't currently set EDNS > > options to increase the allowed reply size and doesn't fall back to a > > TCP query if the UDP query fails because of truncation. We think this > > was partially due to these DNS servers using DNS cookies, which > > increases the reply size. > > > > (Our DNS probe checks not just for a successful reply but that the query > > resolved to at least one A record, so some of the time the reply could > > be long enough that the truncated version didn't include any of the A > > records.) > > > > Right now the only way to know for sure that your DNS query failed > > because of truncation is to examine Blackbox probe logs, usually through > > its web interface (but you can manually query with '..&debug=true'), and > > notice that one of the log messages reports something like 'flags: qr tc > > rd ra;' (the 'tc' is the important bit). If you are sure you know how > > many resource records should in the various sections of the DNS replies, > > you can check if the probe got the right number of RRs using the > > probe_dns_*_rrs metrics. > > > > For DNS servers that accept TCP connections, you can work around this by > > switching your Blackbox DNS module to using TCP instead of the (default) > > UDP. > > > > (I suspect that most people will never run into this, but for our sins > > we check some external DNS names that have long CNAME chains and other > > fun things.) > > > > - cks -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/2625310.1719407827%40apps0.cs.toronto.edu.

