On 07. 11. 25 18:50, Joe Abley wrote:
Hi all,

I presented today in Montréal about our proposal draft-jabley-dnsop- zone-cut-to-nowhere.

https://datatracker.ietf.org/doc/draft-jabley-dnsop-zone-cut-to-nowhere/ <https://datatracker.ietf.org/doc/draft-jabley-dnsop-zone-cut-to-nowhere/>

Mark Andrews repeated his concern that I remember was mentioned at the mic in Madrid. Mark, let me know whether I have got this right.

TL;DR I see Mark's point, I have tried to write the concern down so that we have a clear shared picture of what it is, and I have dreamed up a couple of ways to address it. I haven't yet talked to my co-authors so the hands that are waving here are my personal hands.

Mark's Concern

Consider a zone cut to nowhere published in the EXAMPLE.COM zone for CORP.EXAMPLE.COM. Then consider a stub resolver sending the query (QNAME = CORP.EXAMPLE.COM) to a resolver A on the public Internet where the private namespace CORP.EXAMPLE.COM is not available, which is configured to process cache misses by sending queries to resolver B. So this is an example of a chained resolver processing a query.

stub resolver ---> resolver A ---> resolver B ---> authoritative server

The authoritative server will return a referral of the form "CORP.EXAMPLE.COM. IN NS ." Resolver B will not be able to follow that referral because there is (intentionally) no child nameserver mentioned in it, and will return SERVFAIL to resolver A. Resolver A will repeat the query to resolver B because SERVFAIL invites retries. Mark's concern is that this retry behaviour is unnecessary and potentially harmful.

Addressing Mark's Concern

1. Do nothing, this is not a concern we should worry about. This kind of junk is all over the DNS, it's already happening with the existing (described) other uses of hostname ".", not to mention misconfigurations where hostnames are used but do not resolve properly; we are not going to eliminate all of these failure modes by doing something different here.

2. Change signalling in zone data. Mark suggests a better response from the authoritative server would be to return a referral from for the child that includes the same NS set as the parent. This is clearly possible, but I think it has the disadvantage that it is not as clear what the intention of the zone administrator is. It also assumes that the NS set is coherent and doesn't involve "enterprise DNS" trickery. I am also not sure I would predict that existing deployed resolvers would interpret such a referral response in a way that would not also result in SERVFAIL, e.g. following retries by resolver B to all the listed authoritative servers. This just seems like it invites more random weirdness, and that the weirdness will be harder to measure.

3. Change resolver behaviour. Resolver B SHOULD return a more useful signal than SERVFAIL, maybe with an EDE or something. Resolver A SHOULD avoid retrying if it receives such a signal. This would avoid the behaviour Mark is concerned about in this draft, but would also clean up other uses of the hostname ".". The camel is slightly sad, but perhaps it's worth it to avoid the cost of retries for all the cases where "." is used in place of a real hostname to mean "not provided".

4. This isn't a big enough problem to care about, all of 1 through 3 are horrible, let's forget the whole idea.

Thoughts on this would be appreciated.
I have to correct what I sad on microphone in Montreal:
I completely missed the angle Mark was talking about. Mark is right.

As soon as there is more than one layer of resolvers this will lead to retry storms.

The (only?) method which does not suffer from this, which is deployable in this decade, is a valid delegation to an (empty?) zone suggested by Mark. That zone can have custom TTLs so negative caching can be tuned to suitable values without affecting the parent.

Case in point:
The CPE I was given by my ISP for my home has two upstream recursive resolvers configured (by the ISP, not me).

If my laptop's VPN is not up, queries to an internal domain will cause recursion on one of these ISP resolvers. If that recursion fails (say because of "NS ." delegation), the CPE resolver will get SERVFAIL and retry the _other_ upstream resolver, only to get SERVFAIL again.

In BIND itself SERVFAIL cache has 1 second TTL and keyed by (class, qname, qtype). Consequently, a typical web browser query for (portal.internal.example.com A+AAAA+HTTPS) will cause 6 outgoing queries to the auth, 6 cache SERVFAIL entries for 1 second, rinse and repeat until I close the browser tab I forgotten about.

So yeah, it looked like a nice idea with pleasing aesthetics, but I think problems it creates are not worth it.

--
Petr Špaček

_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to