On 7/8/25 08:32, Ben Schwartz wrote:


On Jul 7, 2025, at 8:13 PM, Tommy Jensen <[email protected]> wrote:

Comments in-line. Meta note: it seems your concerns are all focused on the way allow/block traffic enforcement consumes this information. Do you have any concerns with the ability to discover associations for logging and audit purposes?

For logging and audit purposes the standard practice is to use PTR records, which are more precise than CIDRS and have clearer semantics.

Tying into what I've said further down about : this implies JIT queries at every connection. For enforcement, this introduces unnecessary latency at connection time wherever this is introduced versus the ability to independently refresh the candidate CIDRs that can JIT be consulted as a local cache. For auditing and logging, it's potentially a lot of traffic messages per app/service experience that could instead have been 1-2 transactions depending on how many unique IP addresses are contacted.

Another benefit: CIDRs can specify arbitrary length prefixes, whereas PTR requires multiple entries when prefixes do not align with octet/nibble boundaries.



On 7/7/25 15:58, Ben Schwartz wrote:
Thanks for the explanation.

I have serious concerns about proposals that would encourage blocking/unblocking IP addresses based on previous DNS activity.  If your network's firewall behavior depends on the history of DNS queries, this creates an extreme form of stateful protocol ossification that prevents IP from working correctly. It's like NAT but worse, because the stateful behaviors at the IP layer depend on history from "outside" of IP.

The comparison with NAT seems like a weird apples-to-mangoes comparison.

They’re both situations where the apparent behavior of the IP layer is inconsistent, and depends on prior history.  This breaks the end-to-end IP model, complicates debugging, etc.  I could have said “stateful firewall” instead.

...
Anyway, how is this different from enforcement of IP allow/block based on other dynamic, non-IP logic such as which process it originated from or what the current time is, both common practices today that are "history from 'outside' of IP"?

Neither of those are “history”.  They are information contemporaneous with the access decision.

Ok, for those two examples that's fair.

When I create an IP addressed based firewall rule, I did so based on some assumption such as a reputation lookup (is this ASN trustworthy?). That is history, unless someone out there is re-consulting ASN reputation JIT. Whether the reasoning for making an allow/block/route decision is based on the believed-to-be association of the IP address with a domain name, or an ASN number, or a company identity prior to a merger, or a service being active versus deprecated, or, or, or... is not in any way "like NAT but worse" including those that are also previous API calls like the DNS such as ASN reputation or ownership. The firewall, the IP layer, they don't care about any of that including domain name mappings. They may be operating on false assumptions, which are the responsibility of the entity plumbing the firewall rules, but that has always been the case.



It also messes with DNS (which is a lookup protocol,/not/ a signaling protocol).  For example, it creates perverse interactions with DNS stub caching, which one might have to/disable/ in order to generate the DNS query activity that will cause a block to be lifted.

Implementations may end up doing this, sure, but it isn't inevitable. I know my previous employer's implementation of DNS-based allowlisting has a long-term approval time period that well exceeds most TTL values, because in real life, everyone continues using resolutions for as long as possible. Breaking connectivity just because a cache was cleared (for any reason) was deemed unacceptable. In other words: this is up to the implementation of the enforcement.

This seems like a great example of how this is going to fail in nasty, confusing ways.  QUIC connections will happily live for days, for example, but this mechanism means that _sometimes_ those connections are going to fail because of an invisible timeout.  Those failures will probably exhibit blackhole behavior, resulting in an outage (of probably at least 30 seconds) while the connection attempts to get through, eventually gives up, and falls back to an application-level reconnect (which may also be user-visible).

No matter the mechanism used, any attempt to validate a policy decision point's opinion on the access rights of a given destination are more and more encouraged to be time bound. I'm not buying "momentary need to reconnect" once per ones of days as an argument against anything in common networking scenarios. How would a policy decision point decide to revoke permission to access a given network segment if a node in that segment is known to be compromised if it's afraid it might break a long-running connection? Just because a connection was trusted at handshake time does not mean it remains trusted for its lifetime by an operator of any segment of the threat model.



Network operators that want to limit network activity to allowed DNS domains should use a domain-based transport proxy such as HTTP CONNECT, so policies can be imposed/before/ DNS resolution, and each data flow is explicitly tied to its domain.

Two things: (1) you are assuming the deployer is a *network* operator and not a *device* operator. What about when there is no network "edge" to manage?

On-device enforcement seems like it doesn’t need this mechanism.  Apps can be identified reliably, and can ship their own network access policy signed by the publisher, etc.

That assumes the apps in question can do this, or will in any timely fashion. Not all network or device/endpoint operators have that luxury in their dependencies. A common scenario that complicates IPv6 migration is the inability to update apps for many years at a time for non-networking reasons. Also, this focuses on the endpoint operator (which I did push for) but now rules out network operators. Both have name lookups in common, hence this draft's suggestion to give an option for operators without control over some subset of the end-to-end architecture other than attempts at TLS termination and the net negative that introduces.

Another point for app manifests (which btw are a good idea I agree with, especially for endpoints managed by the app dev): what about endpoints the app doesn't manage? When an app developed by Foo Enterprises offers data backup integration with DropBox, OneDrive, or whatever else, is Foo Enterprises supposed to push app updates when they divine that DropBox or Microsoft changes their associated CIDRs? That would be ideal, but not in line with real-world expectations, similar to saying all uses of remote IP addresses must be A/AAAA values for a domain name.


...
It is absolutely true that trust in an endpoint, defined by any identifier, has risks. A firewall rule that blocks specific IP addresses isn't perfect when an allowed IP address will proxy traffic to those same IP addresses. I do not see how this draft introduces a new paradigm in that regard.

AFAICT this draft is only relevant in cases where
1. An application bootstraps via DNS
2. The application then begins to communicate with other IP addresses without resolving them from names. 3. These address literals haven’t been communicated to the firewall in advance. 4. The firewall does not normally allow client-initiated access to these IPs.
5. The firewall wants this application to have access to unrecognized IPs.
6. The application’s traffic is not identified in any other way.

If our only examples of this usage pattern are cases (like TURN) where the policy is not an effective security measure, then it doesn’t make sense to build standards to support it.

Your step 3 brings me back to this draft. Communicating these IP addresses to the firewall, wherever it resides, is simplified if the CIDRs for a name (the thing consistent access policy is referencing) can be looked up rather than regularly scraped manually from a hodge podge of sources. This is a very real customer story I witnessed repeatedly at my previous employer. Why can't the firewall be the DNS client in this draft's flow, and collaborating with the DNS resolver used by managed endpoints for consistency (not getting different query results between endpoints, which happens for lots of reasons)? This draft is about distribution of information


...
As for assigning DNS names to <whatever>... yes, I would very much like that, but this requires the same operator to control the managed endpoints *and* all services they connect to. That isn't reflective of reality, where everyone has dependencies on many third parties who can define their endpoints by domain name or IP addresses.

Third-party IP address literal dependencies are rare in client applications.  When they do exist, there is often no guarantee that they will stay within a particular CIDR.  For example, consumer VPN operators often distribute server IPs as IP literals, but they generally do not promise that those IPs will fall in any particular range.

...
Even though there are services which operate this way, many do limit and actively communicate their CIDR dependencies.

Could you point to some examples of services that fit this pattern _and_ require IP-literal-based communication with these endpoints from enterprise clients?

The majority of use cases I directly learned about are under NDA, but one easy example that isn't is WhatsApp. Predictable domain name lookups followed by contact to hard-coded IP addresses. Why? I don't know, and as a network or endpoint operator with no control over the app's development, why should I care? The point of the draft is to give a mechanism to associate IP addresses with names in a standard way that avoids per-vendor manual documentation of these mappings *without* having to worry about the long right tail of weird things apps do with networking. The draft is *not* attempting to say there aren't alternative approaches, many fo which you've iterated, just that this is a viable alternative in less-than-ideal situations where lack of control over apps/endpoints/services leads to a lack of options.


—Ben
_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to