Except BIND does exactly this.  It retries and if all the servers for the zone 
fail the <name,type> is flagged as bad for 10 minutes and any validation that 
depends on that lookup fails with DNS_R_BROKENCHAIN which results in SERVFAIL 
rather than a retry.  This was how we dealt with the so called “rollover and 
die” issue.

                } else if (result == DNS_R_BROKENCHAIN) {
                        isc_result_t tresult;
                        isc_time_t expire;
                        isc_interval_t i;

                        isc_interval_set(&i, DNS_RESOLVER_BADCACHETTL(fctx), 0);
                        tresult = isc_time_nowplusinterval(&expire, &i);
                        if (negative &&
                            (fctx->type == dns_rdatatype_dnskey ||
                             fctx->type == dns_rdatatype_ds) &&
                            tresult == ISC_R_SUCCESS)
                        {
                                dns_resolver_addbadcache(res, fctx->name,
                                                         fctx->type, &expire);
                        }
                        done = true;
                        goto cleanup_fetchctx;
                } else {
                        fctx_try(fctx, true, true);
                        goto cleanup_fetchctx;
                }

The world doesn’t fall over with limited retries.  We had zero reports 
resolution failures due to this incident.  This also allows a validator behind 
a validator to work reliably by having the validator that talks directly to the 
authoritative servers filter out the garbage responses.  Always send CD=1 is 
STUPID.

> On 19 Jul 2023, at 04:54, Ondřej Surý <[email protected]> wrote:
> 
> With my implementor’s hat on, I think this is wrong approach. It (again) adds 
> a complexity to the resolvers and yet again based (mostly) on isolated 
> incident. I really don’t want yet another “serve-stale” in the resolvers. I 
> have to yet see an evidence that serve-stale has helped anything since the 
> original incident, but now every resolver has to have it because people want 
> it.
> 
> And operationally, it will just pamper over the issue which might then go 
> unnoticed for longer period of time rather than being fixed right away.
> 
> Ondrej
> --
> Ondřej Surý <[email protected]> (He/Him)
> 
>> On 18. 7. 2023, at 20:38, Gavin McCullagh <[email protected]> wrote:
>> 
>> I'd like to reach out to NLNet about changing Unbound to do this, so I want 
>> to make sure people have a chance to disagree.  Feel free to voice your 
>> disagreement (and reasons) here if you do.
> 
> 
> _______________________________________________
> dns-operations mailing list
> [email protected]
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742              INTERNET: [email protected]


_______________________________________________
dns-operations mailing list
[email protected]
https://lists.dns-oarc.net/mailman/listinfo/dns-operations

Reply via email to