On Fri, Feb 09, 2018 at 11:11:18AM +0100, Matthieu Herrb wrote: > Hi, > > I've recently setup a new pair of OpenBSD 6.2 pf firewalls (with carp) > in my lab, and that's not performing very well. > > tcp-based NFS v3 and v4 traffic (between Linux clients and a NetApp > server) through it is struggling, and some SSH or HTTPS transfers are > stalling, with their states disapearing from the state table. > > I'm trying to figure out what's going on to fix the issue. >
Thanks to all who answered in private. With their advices and a bit of personal research, it looks like this firewall pair is now working as expected. One of the main issues was caused by a server having 2 interfaces in 2 different vlans that are routed through this firewall. This generated asymetric routing, so the reply paquets weren't travesing the firewall and not updating the state, wich stayed half-open for 30s, before expiring and cutting the connection. A tad of source-routing on the linux side now forces the trafic to stay symetric and everything's fine. Another issue seem to come from the fact that the new firewalls are faster than the previous Cisco router. That apparentlt triggered bugs in the vmxnet3 driver of CentOS 6 virtual machines, Upgrading to the driver from open-vm-tools, seems to have fixed the reset of the NFS traffic issues. The last point is that there seems to be a bug in the half-open accounting code. The huge number I'm seeing here is in fact pretty surely negative: > > The main anomaly I see is the huge number (and it keeps growing) of > half-open tcp states, after 24h of uptime. See pfctl -vsi output > below. > > half-open tcp 4294375902 This is 0xfff6f9de So it seems that, either because of the assymetric route issue, or something else, the number of half open connections is decremented more often that it's incremented and lead to this unsigned overflow. But as Henning@ mentionned it, this is only accounting and not actually used anywhere, so it should cause any real-life issue. -- Matthieu Herrb