On 07. 11. 25 18:50, Joe Abley wrote:
Hi all,
I presented today in Montréal about our proposal draft-jabley-dnsop-
zone-cut-to-nowhere.
https://datatracker.ietf.org/doc/draft-jabley-dnsop-zone-cut-to-nowhere/
<https://datatracker.ietf.org/doc/draft-jabley-dnsop-zone-cut-to-nowhere/>
Mark Andrews repeated his concern that I remember was mentioned at the
mic in Madrid. Mark, let me know whether I have got this right.
TL;DR I see Mark's point, I have tried to write the concern down so that
we have a clear shared picture of what it is, and I have dreamed up a
couple of ways to address it. I haven't yet talked to my co-authors so
the hands that are waving here are my personal hands.
Mark's Concern
Consider a zone cut to nowhere published in the EXAMPLE.COM zone for
CORP.EXAMPLE.COM. Then consider a stub resolver sending the query (QNAME
= CORP.EXAMPLE.COM) to a resolver A on the public Internet where the
private namespace CORP.EXAMPLE.COM is not available, which is configured
to process cache misses by sending queries to resolver B. So this is an
example of a chained resolver processing a query.
stub resolver ---> resolver A ---> resolver B ---> authoritative server
The authoritative server will return a referral of the form
"CORP.EXAMPLE.COM. IN NS ." Resolver B will not be able to follow that
referral because there is (intentionally) no child nameserver mentioned
in it, and will return SERVFAIL to resolver A. Resolver A will repeat
the query to resolver B because SERVFAIL invites retries. Mark's concern
is that this retry behaviour is unnecessary and potentially harmful.
Addressing Mark's Concern
1. Do nothing, this is not a concern we should worry about. This kind of
junk is all over the DNS, it's already happening with the existing
(described) other uses of hostname ".", not to mention misconfigurations
where hostnames are used but do not resolve properly; we are not going
to eliminate all of these failure modes by doing something different here.
2. Change signalling in zone data. Mark suggests a better response from
the authoritative server would be to return a referral from for the
child that includes the same NS set as the parent. This is clearly
possible, but I think it has the disadvantage that it is not as clear
what the intention of the zone administrator is. It also assumes that
the NS set is coherent and doesn't involve "enterprise DNS" trickery. I
am also not sure I would predict that existing deployed resolvers would
interpret such a referral response in a way that would not also result
in SERVFAIL, e.g. following retries by resolver B to all the listed
authoritative servers. This just seems like it invites more random
weirdness, and that the weirdness will be harder to measure.
3. Change resolver behaviour. Resolver B SHOULD return a more useful
signal than SERVFAIL, maybe with an EDE or something. Resolver A SHOULD
avoid retrying if it receives such a signal. This would avoid the
behaviour Mark is concerned about in this draft, but would also clean up
other uses of the hostname ".". The camel is slightly sad, but perhaps
it's worth it to avoid the cost of retries for all the cases where "."
is used in place of a real hostname to mean "not provided".
4. This isn't a big enough problem to care about, all of 1 through 3 are
horrible, let's forget the whole idea.
Thoughts on this would be appreciated.
I have to correct what I sad on microphone in Montreal:
I completely missed the angle Mark was talking about. Mark is right.
As soon as there is more than one layer of resolvers this will lead to
retry storms.
The (only?) method which does not suffer from this, which is deployable
in this decade, is a valid delegation to an (empty?) zone suggested by
Mark. That zone can have custom TTLs so negative caching can be tuned to
suitable values without affecting the parent.
Case in point:
The CPE I was given by my ISP for my home has two upstream recursive
resolvers configured (by the ISP, not me).
If my laptop's VPN is not up, queries to an internal domain will cause
recursion on one of these ISP resolvers. If that recursion fails (say
because of "NS ." delegation), the CPE resolver will get SERVFAIL and
retry the _other_ upstream resolver, only to get SERVFAIL again.
In BIND itself SERVFAIL cache has 1 second TTL and keyed by (class,
qname, qtype). Consequently, a typical web browser query for
(portal.internal.example.com A+AAAA+HTTPS) will cause 6 outgoing queries
to the auth, 6 cache SERVFAIL entries for 1 second, rinse and repeat
until I close the browser tab I forgotten about.
So yeah, it looked like a nice idea with pleasing aesthetics, but I
think problems it creates are not worth it.
--
Petr Špaček
_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]