On 7/8/25 08:32, Ben Schwartz wrote:
On Jul 7, 2025, at 8:13 PM, Tommy Jensen <[email protected]> wrote:
Comments in-line. Meta note: it seems your concerns are all focused
on the way allow/block traffic enforcement consumes this information.
Do you have any concerns with the ability to discover associations
for logging and audit purposes?
For logging and audit purposes the standard practice is to use PTR
records, which are more precise than CIDRS and have clearer semantics.
Tying into what I've said further down about : this implies JIT queries
at every connection. For enforcement, this introduces unnecessary
latency at connection time wherever this is introduced versus the
ability to independently refresh the candidate CIDRs that can JIT be
consulted as a local cache. For auditing and logging, it's potentially a
lot of traffic messages per app/service experience that could instead
have been 1-2 transactions depending on how many unique IP addresses are
contacted.
Another benefit: CIDRs can specify arbitrary length prefixes, whereas
PTR requires multiple entries when prefixes do not align with
octet/nibble boundaries.
On 7/7/25 15:58, Ben Schwartz wrote:
Thanks for the explanation.
I have serious concerns about proposals that would encourage
blocking/unblocking IP addresses based on previous DNS activity. If
your network's firewall behavior depends on the history of DNS
queries, this creates an extreme form of stateful protocol
ossification that prevents IP from working correctly. It's like NAT
but worse, because the stateful behaviors at the IP layer depend on
history from "outside" of IP.
The comparison with NAT seems like a weird apples-to-mangoes comparison.
They’re both situations where the apparent behavior of the IP layer is
inconsistent, and depends on prior history. This breaks the
end-to-end IP model, complicates debugging, etc. I could have said
“stateful firewall” instead.
...
Anyway, how is this different from enforcement of IP allow/block
based on other dynamic, non-IP logic such as which process it
originated from or what the current time is, both common practices
today that are "history from 'outside' of IP"?
Neither of those are “history”. They are information contemporaneous
with the access decision.
Ok, for those two examples that's fair.
When I create an IP addressed based firewall rule, I did so based on
some assumption such as a reputation lookup (is this ASN trustworthy?).
That is history, unless someone out there is re-consulting ASN
reputation JIT. Whether the reasoning for making an allow/block/route
decision is based on the believed-to-be association of the IP address
with a domain name, or an ASN number, or a company identity prior to a
merger, or a service being active versus deprecated, or, or, or... is
not in any way "like NAT but worse" including those that are also
previous API calls like the DNS such as ASN reputation or ownership. The
firewall, the IP layer, they don't care about any of that including
domain name mappings. They may be operating on false assumptions, which
are the responsibility of the entity plumbing the firewall rules, but
that has always been the case.
It also messes with DNS (which is a lookup protocol,/not/ a
signaling protocol). For example, it creates perverse interactions
with DNS stub caching, which one might have to/disable/ in order to
generate the DNS query activity that will cause a block to be lifted.
Implementations may end up doing this, sure, but it isn't inevitable.
I know my previous employer's implementation of DNS-based
allowlisting has a long-term approval time period that well exceeds
most TTL values, because in real life, everyone continues using
resolutions for as long as possible. Breaking connectivity just
because a cache was cleared (for any reason) was deemed unacceptable.
In other words: this is up to the implementation of the enforcement.
This seems like a great example of how this is going to fail in nasty,
confusing ways. QUIC connections will happily live for days, for
example, but this mechanism means that _sometimes_ those connections
are going to fail because of an invisible timeout. Those failures
will probably exhibit blackhole behavior, resulting in an outage (of
probably at least 30 seconds) while the connection attempts to get
through, eventually gives up, and falls back to an application-level
reconnect (which may also be user-visible).
No matter the mechanism used, any attempt to validate a policy decision
point's opinion on the access rights of a given destination are more and
more encouraged to be time bound. I'm not buying "momentary need to
reconnect" once per ones of days as an argument against anything in
common networking scenarios. How would a policy decision point decide to
revoke permission to access a given network segment if a node in that
segment is known to be compromised if it's afraid it might break a
long-running connection? Just because a connection was trusted at
handshake time does not mean it remains trusted for its lifetime by an
operator of any segment of the threat model.
Network operators that want to limit network activity to allowed DNS
domains should use a domain-based transport proxy such as HTTP
CONNECT, so policies can be imposed/before/ DNS resolution, and each
data flow is explicitly tied to its domain.
Two things: (1) you are assuming the deployer is a *network* operator
and not a *device* operator. What about when there is no network
"edge" to manage?
On-device enforcement seems like it doesn’t need this mechanism. Apps
can be identified reliably, and can ship their own network access
policy signed by the publisher, etc.
That assumes the apps in question can do this, or will in any timely
fashion. Not all network or device/endpoint operators have that luxury
in their dependencies. A common scenario that complicates IPv6 migration
is the inability to update apps for many years at a time for
non-networking reasons. Also, this focuses on the endpoint operator
(which I did push for) but now rules out network operators. Both have
name lookups in common, hence this draft's suggestion to give an option
for operators without control over some subset of the end-to-end
architecture other than attempts at TLS termination and the net negative
that introduces.
Another point for app manifests (which btw are a good idea I agree with,
especially for endpoints managed by the app dev): what about endpoints
the app doesn't manage? When an app developed by Foo Enterprises offers
data backup integration with DropBox, OneDrive, or whatever else, is Foo
Enterprises supposed to push app updates when they divine that DropBox
or Microsoft changes their associated CIDRs? That would be ideal, but
not in line with real-world expectations, similar to saying all uses of
remote IP addresses must be A/AAAA values for a domain name.
...
It is absolutely true that trust in an endpoint, defined by any
identifier, has risks. A firewall rule that blocks specific IP
addresses isn't perfect when an allowed IP address will proxy traffic
to those same IP addresses. I do not see how this draft introduces a
new paradigm in that regard.
AFAICT this draft is only relevant in cases where
1. An application bootstraps via DNS
2. The application then begins to communicate with other IP addresses
without resolving them from names.
3. These address literals haven’t been communicated to the firewall in
advance.
4. The firewall does not normally allow client-initiated access to
these IPs.
5. The firewall wants this application to have access to unrecognized IPs.
6. The application’s traffic is not identified in any other way.
If our only examples of this usage pattern are cases (like TURN) where
the policy is not an effective security measure, then it doesn’t make
sense to build standards to support it.
Your step 3 brings me back to this draft. Communicating these IP
addresses to the firewall, wherever it resides, is simplified if the
CIDRs for a name (the thing consistent access policy is referencing) can
be looked up rather than regularly scraped manually from a hodge podge
of sources. This is a very real customer story I witnessed repeatedly at
my previous employer. Why can't the firewall be the DNS client in this
draft's flow, and collaborating with the DNS resolver used by managed
endpoints for consistency (not getting different query results between
endpoints, which happens for lots of reasons)? This draft is about
distribution of information
...
As for assigning DNS names to <whatever>... yes, I would very much
like that, but this requires the same operator to control the managed
endpoints *and* all services they connect to. That isn't reflective
of reality, where everyone has dependencies on many third parties who
can define their endpoints by domain name or IP addresses.
Third-party IP address literal dependencies are rare in client
applications. When they do exist, there is often no guarantee that
they will stay within a particular CIDR. For example, consumer VPN
operators often distribute server IPs as IP literals, but they
generally do not promise that those IPs will fall in any particular range.
...
Even though there are services which operate this way, many do limit
and actively communicate their CIDR dependencies.
Could you point to some examples of services that fit this pattern
_and_ require IP-literal-based communication with these endpoints from
enterprise clients?
The majority of use cases I directly learned about are under NDA, but
one easy example that isn't is WhatsApp. Predictable domain name lookups
followed by contact to hard-coded IP addresses. Why? I don't know, and
as a network or endpoint operator with no control over the app's
development, why should I care? The point of the draft is to give a
mechanism to associate IP addresses with names in a standard way that
avoids per-vendor manual documentation of these mappings *without*
having to worry about the long right tail of weird things apps do with
networking. The draft is *not* attempting to say there aren't
alternative approaches, many fo which you've iterated, just that this is
a viable alternative in less-than-ideal situations where lack of control
over apps/endpoints/services leads to a lack of options.
—Ben
_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]