On Jul 13, 2025, at 4:43 PM, Tommy Jensen <[email protected]> wrote:
...
For logging and audit purposes the standard practice is to use PTR records, 
which are more precise than CIDRS and have clearer semantics.

Tying into what I've said further down about : this implies JIT queries at 
every connection. For enforcement, this introduces unnecessary latency at 
connection time wherever this is introduced versus the ability to independently 
refresh the candidate CIDRs that can JIT be consulted as a local cache.

The proposed semantics in the draft are not equivalent to what is achieved 
using PTR records.  Perhaps that relationship could be adjusted.  For example, 
if a CIDRS record means “all addresses in these ranges carry PTR records that 
fall under this name”, that would be easier to understand than something about 
application behaviors of clients that resolve the owner name.

...
When I create an IP addressed based firewall rule, I did so based on some 
assumption such as a reputation lookup (is this ASN trustworthy?). That is 
history, unless someone out there is re-consulting ASN reputation JIT.

The concern I was expressing here was about complex, surprising, hidden state 
(“hysteresis”) based on previous local activity.  “ASN reputation" state feeds 
the configuration, so it should be legible to the administrator.

If someone asks the helpdesk “why can’t I ssh to 2001:db8::1 anymore", and the 
answer is “you need to re-run ‘dig ssh.example.com’ first to re-prime the 
firewall”, this is a bad firewall design.

...
It also messes with DNS (which is a lookup protocol, not a signaling protocol). 
 For example, it creates perverse interactions with DNS stub caching, which one 
might have to disable in order to generate the DNS query activity that will 
cause a block to be lifted.

Implementations may end up doing this, sure, but it isn't inevitable. I know my 
previous employer's implementation of DNS-based allowlisting has a long-term 
approval time period that well exceeds most TTL values, because in real life, 
everyone continues using resolutions for as long as possible. Breaking 
connectivity just because a cache was cleared (for any reason) was deemed 
unacceptable. In other words: this is up to the implementation of the 
enforcement.

This seems like a great example of how this is going to fail in nasty, 
confusing ways.  QUIC connections will happily live for days, for example, but 
this mechanism means that _sometimes_ those connections are going to fail 
because of an invisible timeout.  Those failures will probably exhibit 
blackhole behavior, resulting in an outage (of probably at least 30 seconds) 
while the connection attempts to get through, eventually gives up, and falls 
back to an application-level reconnect (which may also be user-visible).

No matter the mechanism used, any attempt to validate a policy decision point's 
opinion on the access rights of a given destination are more and more 
encouraged to be time bound. I'm not buying "momentary need to reconnect" once 
per ones of days as an argument against anything in common networking scenarios.

This could easily mean that my client hangs for 30 seconds every morning when I 
sit down at my workstation.

How would a policy decision point decide to revoke permission to access a given 
network segment if a node in that segment is known to be compromised if it's 
afraid it might break a long-running connection?

Presumably by re-checking the permission instead of revoking it.  This is the 
problem I’m trying to highlight: the firewall doesn't maintain its own state in 
this design, but only updates it as a side effect of the client’s DNS activity, 
which was _not_ designed for this purpose (and therefore lacks support for 
things like “refreshing the lease”).

...
When an app developed by Foo Enterprises offers data backup integration with 
DropBox, OneDrive, or whatever else, is Foo Enterprises supposed to push app 
updates when they divine that DropBox or Microsoft changes their associated 
CIDRs?

No, it’s supposed to use DropBox or Microsoft endpoints by their domain name, 
as is the norm.

That would be ideal, but not in line with real-world expectations, similar to 
saying all uses of remote IP addresses must be A/AAAA values for a domain name.

That seems like a fine rule for an ultra-paranoid enterprise network that wants 
to allowlist access by domain names.

...
Your step 3 brings me back to this draft. Communicating these IP addresses to 
the firewall, wherever it resides, is simplified if the CIDRs for a name (the 
thing consistent access policy is referencing) can be looked up rather than 
regularly scraped manually from a hodge podge of sources.

Manual firewall config by the admin is very different from what I see in this 
draft, which appears to rely on monitoring of the client’s DNS queries to 
trigger a fetch for the associated CIDRS records.  Perhaps I’m not 
understanding your intended use case correctly.

For manual firewall config, I would strongly recommend leaving DNS out of this 
entirely, and instead use a URL for a JSON blob like AWS does: 
https://docs.aws.amazon.com/vpc/latest/userguide/aws-ip-syntax.html.  That 
avoids the need to assume one “service” per domain name, DNSSEC, etc.

The majority of use cases I directly learned about are under NDA, but one easy 
example that isn't is WhatsApp. Predictable domain name lookups followed by 
contact to hard-coded IP addresses. Why?

From what I’ve seen of the various types of entities and their goals, I don’t 
think WhatsApp is going to be a good motivating use case for this draft.  (See 
e.g. https://faq.whatsapp.com/1299035810920553)

—Ben
_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to