[DNSOP] Re: Mark Andrews' concerns to nowhere

Petr Špaček Fri, 14 Nov 2025 00:11:26 -0800

On 07. 11. 25 18:50, Joe Abley wrote:

Hi all,
I presented today in Montréal about our proposal draft-jabley-dnsop-zone-cut-to-nowhere.
https://datatracker.ietf.org/doc/draft-jabley-dnsop-zone-cut-to-nowhere/<https://datatracker.ietf.org/doc/draft-jabley-dnsop-zone-cut-to-nowhere/>
Mark Andrews repeated his concern that I remember was mentioned at themic in Madrid. Mark, let me know whether I have got this right.
TL;DR I see Mark's point, I have tried to write the concern down so thatwe have a clear shared picture of what it is, and I have dreamed up acouple of ways to address it. I haven't yet talked to my co-authors sothe hands that are waving here are my personal hands.
Mark's Concern
Consider a zone cut to nowhere published in the EXAMPLE.COM zone forCORP.EXAMPLE.COM. Then consider a stub resolver sending the query (QNAME= CORP.EXAMPLE.COM) to a resolver A on the public Internet where theprivate namespace CORP.EXAMPLE.COM is not available, which is configuredto process cache misses by sending queries to resolver B. So this is anexample of a chained resolver processing a query.
stub resolver ---> resolver A ---> resolver B ---> authoritative server
The authoritative server will return a referral of the form"CORP.EXAMPLE.COM. IN NS ." Resolver B will not be able to follow thatreferral because there is (intentionally) no child nameserver mentionedin it, and will return SERVFAIL to resolver A. Resolver A will repeatthe query to resolver B because SERVFAIL invites retries. Mark's concernis that this retry behaviour is unnecessary and potentially harmful.
Addressing Mark's Concern
1. Do nothing, this is not a concern we should worry about. This kind ofjunk is all over the DNS, it's already happening with the existing(described) other uses of hostname ".", not to mention misconfigurationswhere hostnames are used but do not resolve properly; we are not goingto eliminate all of these failure modes by doing something different here.
2. Change signalling in zone data. Mark suggests a better response fromthe authoritative server would be to return a referral from for thechild that includes the same NS set as the parent. This is clearlypossible, but I think it has the disadvantage that it is not as clearwhat the intention of the zone administrator is. It also assumes thatthe NS set is coherent and doesn't involve "enterprise DNS" trickery. Iam also not sure I would predict that existing deployed resolvers wouldinterpret such a referral response in a way that would not also resultin SERVFAIL, e.g. following retries by resolver B to all the listedauthoritative servers. This just seems like it invites more randomweirdness, and that the weirdness will be harder to measure.
3. Change resolver behaviour. Resolver B SHOULD return a more usefulsignal than SERVFAIL, maybe with an EDE or something. Resolver A SHOULDavoid retrying if it receives such a signal. This would avoid thebehaviour Mark is concerned about in this draft, but would also clean upother uses of the hostname ".". The camel is slightly sad, but perhapsit's worth it to avoid the cost of retries for all the cases where "."is used in place of a real hostname to mean "not provided".
4. This isn't a big enough problem to care about, all of 1 through 3 arehorrible, let's forget the whole idea.
Thoughts on this would be appreciated.

I have to correct what I sad on microphone in Montreal:
I completely missed the angle Mark was talking about. Mark is right.

As soon as there is more than one layer of resolvers this will lead toretry storms.

The (only?) method which does not suffer from this, which is deployablein this decade, is a valid delegation to an (empty?) zone suggested byMark. That zone can have custom TTLs so negative caching can be tuned tosuitable values without affecting the parent.


Case in point:

The CPE I was given by my ISP for my home has two upstream recursiveresolvers configured (by the ISP, not me).

If my laptop's VPN is not up, queries to an internal domain will causerecursion on one of these ISP resolvers. If that recursion fails (saybecause of "NS ." delegation), the CPE resolver will get SERVFAIL andretry the _other_ upstream resolver, only to get SERVFAIL again.

In BIND itself SERVFAIL cache has 1 second TTL and keyed by (class,qname, qtype). Consequently, a typical web browser query for(portal.internal.example.com A+AAAA+HTTPS) will cause 6 outgoing queriesto the auth, 6 cache SERVFAIL entries for 1 second, rinse and repeatuntil I close the browser tab I forgotten about.

So yeah, it looked like a nice idea with pleasing aesthetics, but Ithink problems it creates are not worth it.


--
Petr Špaček

_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[DNSOP] Re: Mark Andrews' concerns to nowhere

Reply via email to