On Jul 13, 2025, at 4:43 PM, Tommy Jensen <[email protected]> wrote: ... For logging and audit purposes the standard practice is to use PTR records, which are more precise than CIDRS and have clearer semantics.
Tying into what I've said further down about : this implies JIT queries at every connection. For enforcement, this introduces unnecessary latency at connection time wherever this is introduced versus the ability to independently refresh the candidate CIDRs that can JIT be consulted as a local cache. The proposed semantics in the draft are not equivalent to what is achieved using PTR records. Perhaps that relationship could be adjusted. For example, if a CIDRS record means “all addresses in these ranges carry PTR records that fall under this name”, that would be easier to understand than something about application behaviors of clients that resolve the owner name. ... When I create an IP addressed based firewall rule, I did so based on some assumption such as a reputation lookup (is this ASN trustworthy?). That is history, unless someone out there is re-consulting ASN reputation JIT. The concern I was expressing here was about complex, surprising, hidden state (“hysteresis”) based on previous local activity. “ASN reputation" state feeds the configuration, so it should be legible to the administrator. If someone asks the helpdesk “why can’t I ssh to 2001:db8::1 anymore", and the answer is “you need to re-run ‘dig ssh.example.com’ first to re-prime the firewall”, this is a bad firewall design. ... It also messes with DNS (which is a lookup protocol, not a signaling protocol). For example, it creates perverse interactions with DNS stub caching, which one might have to disable in order to generate the DNS query activity that will cause a block to be lifted. Implementations may end up doing this, sure, but it isn't inevitable. I know my previous employer's implementation of DNS-based allowlisting has a long-term approval time period that well exceeds most TTL values, because in real life, everyone continues using resolutions for as long as possible. Breaking connectivity just because a cache was cleared (for any reason) was deemed unacceptable. In other words: this is up to the implementation of the enforcement. This seems like a great example of how this is going to fail in nasty, confusing ways. QUIC connections will happily live for days, for example, but this mechanism means that _sometimes_ those connections are going to fail because of an invisible timeout. Those failures will probably exhibit blackhole behavior, resulting in an outage (of probably at least 30 seconds) while the connection attempts to get through, eventually gives up, and falls back to an application-level reconnect (which may also be user-visible). No matter the mechanism used, any attempt to validate a policy decision point's opinion on the access rights of a given destination are more and more encouraged to be time bound. I'm not buying "momentary need to reconnect" once per ones of days as an argument against anything in common networking scenarios. This could easily mean that my client hangs for 30 seconds every morning when I sit down at my workstation. How would a policy decision point decide to revoke permission to access a given network segment if a node in that segment is known to be compromised if it's afraid it might break a long-running connection? Presumably by re-checking the permission instead of revoking it. This is the problem I’m trying to highlight: the firewall doesn't maintain its own state in this design, but only updates it as a side effect of the client’s DNS activity, which was _not_ designed for this purpose (and therefore lacks support for things like “refreshing the lease”). ... When an app developed by Foo Enterprises offers data backup integration with DropBox, OneDrive, or whatever else, is Foo Enterprises supposed to push app updates when they divine that DropBox or Microsoft changes their associated CIDRs? No, it’s supposed to use DropBox or Microsoft endpoints by their domain name, as is the norm. That would be ideal, but not in line with real-world expectations, similar to saying all uses of remote IP addresses must be A/AAAA values for a domain name. That seems like a fine rule for an ultra-paranoid enterprise network that wants to allowlist access by domain names. ... Your step 3 brings me back to this draft. Communicating these IP addresses to the firewall, wherever it resides, is simplified if the CIDRs for a name (the thing consistent access policy is referencing) can be looked up rather than regularly scraped manually from a hodge podge of sources. Manual firewall config by the admin is very different from what I see in this draft, which appears to rely on monitoring of the client’s DNS queries to trigger a fetch for the associated CIDRS records. Perhaps I’m not understanding your intended use case correctly. For manual firewall config, I would strongly recommend leaving DNS out of this entirely, and instead use a URL for a JSON blob like AWS does: https://docs.aws.amazon.com/vpc/latest/userguide/aws-ip-syntax.html. That avoids the need to assume one “service” per domain name, DNSSEC, etc. The majority of use cases I directly learned about are under NDA, but one easy example that isn't is WhatsApp. Predictable domain name lookups followed by contact to hard-coded IP addresses. Why? From what I’ve seen of the various types of entities and their goals, I don’t think WhatsApp is going to be a good motivating use case for this draft. (See e.g. https://faq.whatsapp.com/1299035810920553) —Ben
_______________________________________________ DNSOP mailing list -- [email protected] To unsubscribe send an email to [email protected]
