Hello,
thank you for detailed bug report. It clearly shows there is something odd.
On Mon, May 08, 2023 at 01:30:24AM +0300, Joosep wrote:
</snip>
> When looking through the pf debug-level logs, 2 events stand up - first one
> has no problems, second one fails.Note that source port 53081 for nat exist
> in
> 2 different events (this particular test run was 66 dns requests long
> on an otherwise idle system):
> May 8 03:35:53 lab73 /bsd: pf: key search, in on vlan5: UDP wire: (0)
> 10.10.11.3:25217 10.10.10.1:53
> May 8 03:35:53 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.11.3:25217
> 10.10.10.1:53 stack: (0) -
> May 8 03:35:53 lab73 /bsd: pf: key search, out on em2: UDP wire: (0)
> 10.10.10.1:53 10.10.11.3:25217
> May 8 03:35:53 lab73 /bsd: pf: pf_map_addr: selected address 10.10.10.2
> May 8 03:35:53 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.10.1:53
> 10.10.10.2:53081 stack: (0) 10.10.10.1:53 10.10.11.3:25217
> May 8 03:35:53 lab73 /bsd: pf: key search, in on em2: UDP wire: (0)
> 10.10.10.1:53 10.10.10.2:53081
> May 8 03:35:53 lab73 /bsd: pf: key search, out on vlan5: UDP wire: (0)
> 10.10.11.3:25217 10.10.10.1:53
> .....
> May 8 03:36:02 lab73 /bsd: pf: key search, in on vlan5: UDP wire: (0)
> 10.10.11.3:17189 10.10.10.1:53
> May 8 03:36:02 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.11.3:17189
> 10.10.10.1:53 stack: (0) -
> May 8 03:36:02 lab73 /bsd: pf: key search, out on em2: UDP wire: (0)
> 10.10.10.1:53 10.10.11.3:17189
> May 8 03:36:02 lab73 /bsd: pf: pf_map_addr: selected address 10.10.10.2
> May 8 03:36:02 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.10.1:53
> 10.10.10.2:53081 stack: (0) 10.10.10.1:53 10.10.11.3:17189
> May 8 03:36:02 lab73 /bsd: pf: wire key attach failed on all: UDP out
> wire: (0) 10.10.10.1:53 10.10.10.2:53081 1:0 @3, existing: UDP out wire:
> (0) 10.10.10.1:53 10.10.10.2:53081 stack: (0) 10.10.10.1:53 10.10.11.3:25217
> 2:1 @3
> May 8 03:36:07 lab73 /bsd: pf: key search, in on vlan5: UDP wire: (0)
> 10.10.11.3:17189 10.10.10.1:53
> May 8 03:36:07 lab73 /bsd: pf: key search, out on em2: UDP wire: (0)
> 10.10.10.1:53 10.10.11.3:17189
> May 8 03:36:07 lab73 /bsd: pf: pf_map_addr: selected address 10.10.10.2
> May 8 03:36:07 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.10.1:53
> 10.10.10.2:57351 stack: (0) 10.10.10.1:53 10.10.11.3:17189
> May 8 03:36:07 lab73 /bsd: pf: key search, in on em2: UDP wire: (0)
> 10.10.10.1:53 10.10.10.2:57351
> May 8 03:36:07 lab73 /bsd: pf: key search, out on vlan5: UDP wire: (0)
> 10.10.11.3:17189 10.10.10.1:53
>
> It seems to me (i might be wrong), that system selects source ports
> for nat in a way, that makes collisions likely.
>
this is very odd. source port for NAT is selected in pf_get_sport()
function found in pf_lb.c. the relevant part reads as follows:
182 do {
183 key.af = pd->naf;
184 key.proto = pd->proto;
185 key.rdomain = pd->rdomain;
...
203 } else if (low == 0 && high == 0) {
204 key.port[sidx] = pd->nsport;
...
215 } else {
216 u_int32_t tmp;
217
218 if (low > high) {
219 tmp = low;
220 low = high;
221 high = tmp;
222 }
223 /* low < high */
224 cut = arc4random_uniform(1 + high - low) + low;
225 /* low <= cut <= high */
226 for (tmp = cut; tmp <= high && tmp <= 0xffff;
++tmp) {
227 key.port[sidx] = htons(tmp);
228 if (pf_find_state_all(&key, dir, NULL) ==
229 NULL && !in_baddynamic(tmp, pd->proto))
{
230 *nport = htons(tmp);
231 return (0);
232 }
233 }
234 tmp = cut;
235 for (tmp -= 1; tmp >= low && tmp <= 0xffff; --tmp) {
236 key.port[sidx] = htons(tmp);
237 if (pf_find_state_all(&key, dir, NULL) ==
238 NULL && !in_baddynamic(tmp, pd->proto))
{
239 *nport = htons(tmp);
240 return (0);
241 }
242 }
243 }
If I understand code in parse.y right, the low is PF_NAT_PROXY_PORT_LOW
(50001)
and high is PF_NAT_PROXY_PORT_HIGH (65535), therefore port selection process
happens in else branch 216 - 243. There is a for loop at lines 235 - 240
which tries really hard to find unused port. The code uses
pf_find_state_all()
to make sure the selected port is unused in state table.
if this code suggests port number then pf_find_state_all() somehow
guaranties
the port can not be found in table. the odd thing is that
pf_state_key_attach()
reports that port number exists already in table (is in conflict).
both calls: call to pf_get_sport() and call to pf_state_key_attach() happen
under PF_LOCK(), so they are atomic nothing else can mess up with table.
after walking through the code I suspect we fail to initialize look up for
pf_find_state_all() properly, we forgot to compute a toeplitz hash. Can you
give a try to diff below to see if it fixes NAT?
thanks a lot
regards
sashan
--------8<---------------8<---------------8<------------------8<--------
diff --git a/sys/net/pf_lb.c b/sys/net/pf_lb.c
index a3d8d6333cd..b893dd819df 100644
--- a/sys/net/pf_lb.c
+++ b/sys/net/pf_lb.c
@@ -196,18 +196,24 @@ pf_get_sport(struct pf_pdesc *pd, struct pf_rule *r,
/* XXX bug: icmp states dont use the id on both
* XXX sides (traceroute -I through nat) */
key.port[sidx] = pd->nsport;
+ key.hash = pf_pkt_hash(key.af, key.proto, &key.addr[0],
+ &key.addr[1], key.port[0], key.port[1]);
if (pf_find_state_all(&key, dir, NULL) == NULL) {
*nport = pd->nsport;
return (0);
}
} else if (low == 0 && high == 0) {
key.port[sidx] = pd->nsport;
+ key.hash = pf_pkt_hash(key.af, key.proto, &key.addr[0],
+ &key.addr[1], key.port[0], key.port[1]);
if (pf_find_state_all(&key, dir, NULL) == NULL) {
*nport = pd->nsport;
return (0);
}
} else if (low == high) {
key.port[sidx] = htons(low);
+ key.hash = pf_pkt_hash(key.af, key.proto, &key.addr[0],
+ &key.addr[1], key.port[0], key.port[1]);
if (pf_find_state_all(&key, dir, NULL) == NULL) {
*nport = htons(low);
return (0);
@@ -225,6 +231,9 @@ pf_get_sport(struct pf_pdesc *pd, struct pf_rule *r,
/* low <= cut <= high */
for (tmp = cut; tmp <= high && tmp <= 0xffff; ++tmp) {
key.port[sidx] = htons(tmp);
+ key.hash = pf_pkt_hash(key.af, key.proto,
+ &key.addr[0], &key.addr[1], key.port[0],
+ key.port[1]);
if (pf_find_state_all(&key, dir, NULL) ==
NULL && !in_baddynamic(tmp, pd->proto)) {
*nport = htons(tmp);
@@ -234,6 +243,9 @@ pf_get_sport(struct pf_pdesc *pd, struct pf_rule *r,
tmp = cut;
for (tmp -= 1; tmp >= low && tmp <= 0xffff; --tmp) {
key.port[sidx] = htons(tmp);
+ key.hash = pf_pkt_hash(key.af, key.proto,
+ &key.addr[0], &key.addr[1], key.port[0],
+ key.port[1]);
if (pf_find_state_all(&key, dir, NULL) ==
NULL && !in_baddynamic(tmp, pd->proto)) {
*nport = htons(tmp);
diff --git a/sys/net/pfvar_priv.h b/sys/net/pfvar_priv.h
index 741125e447e..147cd7d8f75 100644
--- a/sys/net/pfvar_priv.h
+++ b/sys/net/pfvar_priv.h
@@ -406,6 +406,9 @@ void pf_state_peer_hton(const struct
pf_state_peer *,
struct pfsync_state_peer *);
void pf_state_peer_ntoh(const struct pfsync_state_peer *,
struct pf_state_peer *);
+u_int16_t pf_pkt_hash(sa_family_t, uint8_t,
+ const struct pf_addr *, const struct pf_addr *,
+ uint16_t, uint16_t);
#endif /* _KERNEL */