Hi All, Following on from my previous post. I needed to reload our PF ruleset today (pfctl -vvsr show about 1270 rules total FWIW).
As soon as I ran 'pfctl -f /etc/pf.conf' our external measurements showed a lot of jitter through the firewall, previous to that jitter had been minimal for hours - nice flat graphs of ping responses :) Looking at our collectd graphs for this period, congestion went from minimal to hovering around 60-80 packets/s and state searches leapt from averaging about 400K/s to around 1.2M/s . Sometimes reloading the PF ruleset again cures this, so I wondered if it is related to optimization (which we have not set, so it is at default value) but in this case I had to reboot the firewall. Based on this does anyone have any idea where to look to identify the problem. Basically, reloading pf.conf seems to be what induces performance problems and a reboot seems to be the most certain cure. Regards, *Kevin Gee* Find out what it’s like to work at Brandwatch <http://www.google.com/url?q=http://www.glassdoor.co.uk/Reviews/Brandwatch-Reviews-E716341.htm&sa=D&sntz=1&usg=AFQjCNF1BY-dSLLjdJg5K2Ukvf-0d0Juuw> On 31 May 2017 at 11:10, Kevin Gee <[email protected]> wrote: > Hi all, > > We have an oldish (2013) but well-spec'd pair of servers (active-backup) > , running OpenBSD 6.0 and PF. > The only difference between the server hardware is that the primary has > two physical processors, the secondary has one. > > This primary firewall is worked pretty hard (see pfctl -si below) and of > late it seems to struggle when load increases. > If we fail over to the secondary we see jitter/dropped packets on priority > traffic. > If we reload PF we often see jitter (testing with world-ping) and drops of > icmp (which has prio 7) which seem to settle if we reload PF again. > If I make a change the queue config, I need to reboot. > > Problems seem to coincide with spikes in congestion, congestion is usually > approx 0.7/s if it rises much above 1 we see problems. > > Most of the CPU cores aren't used much, two of the 8 cores average about > 40%, one went up to 75% when I had problem with a ruleset. > > I am trying to get hard figures rather than a 'feeling'. Stats that seem > high are when there are problems (see vmstat output at the bottom, when > things are relatively quiet , context switching and interrupts are <3000). > > context-switching > 15,000 > interrupts >14,000 > Searches > 500,000 > net.inet.ip.ifq.len is usually < 100 (I've seen it at >700 briefly). This > seems to suggest that changing net.inet.ip.ifq.maxlen may not make a > difference. > > FWIW the ruleset as loaded is around 1300 lines when displayed with pfctl > -vvsr > > I am looking for ways to optimize performance and would appreciate any > suggestions as to what to try and what stats to look at. > The alternative is to buy new hardware, but need to be convinced a faster > processor will make a big difference. > > 1) I am thinking of trying higher values of net.inet.ip.ifq.maxlen, > currently 2048. I tried 2500, didn't see much difference but suspect I can > go quite a bit higher. Does this setting require a reboot and am I right in > thinking this may help congestion, lower interrupts and context-switching? > 2) Memory use is low according to collectd/snmp graphs , we have plenty > can we utilise it more? > 3) Is an upgrade to OpenBSD 6.1 likely to make a significant difference? > 4) We log all dropped traffic to pflog0, will disk I/O be a problem? > > Sorry for vagueness, thanks in advance. > Kevin. > > > > Possibly useful output and spec below: > > Hardware: > > 2 x Quad Core Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz, 3600.54 MH > OpenBSD 6.0 GENERIC.MP#2 amd64 > > > NICs > Inside type ix 10Gbps, e,g, ix1 at pci5 dev 0 function 1 "Intel 82599" > rev 0x01 > Outside and pfsync type em 1Gbps e.g. em1 at pci2 dev 0 function 1 "Intel > I350" rev 0x01 > > Of the 8 cores, two average about 40% utilisation. One of them peaked at > about 75% when struggling. > Memory =64Gbps > > > [LIVE]root@ar1300:~# pfctl -si > Status: Enabled for 0 days 09:04:42 Debug: err > > State Table Total Rate > current entries 1205635 > searches 16678281544 <(667)%20828-1544> > 510320.1/s > inserts 157481830 4818.6/s > removals 156276195 4781.7/s > Counters > match 149125447 4562.9/s > bad-offset 0 0.0/s > fragment 0 0.0/s > short 3395 0.1/s > normalize 296 0.0/s > memory 0 0.0/s > bad-timestamp 0 0.0/s > congestion 14523 0.4/s > ip-option 0 0.0/s > proto-cksum 0 0.0/s > state-mismatch 103949 3.2/s > state-insert 10397 0.3/s > state-limit 0 0.0/s > src-limit 0 0.0/s > synproxy 0 0.0/s > translate 0 0.0/s > no-route 0 0.0/s > > [LIVE]root@ar1300:~# vmstat -si > 4096 bytes per page > 16257397 pages managed > 16007891 pages free > 13720 pages active > 4146 pages inactive > 0 pages being paged out > 16 pages wired > 2000987 pages zeroed > 4 pages reserved for pagedaemon > 6 pages reserved for kernel > 16830030 swap pages > 0 swap pages in use > 0 total anon's in system > 0 free anon's > 119821710 page faults > 119081179 traps > 474725184 interrupts > 510456927 cpu context switches > 255355 fpu context switches > 3224063 software interrupts > 317594717 syscalls > 0 pagein operations > 329258 forks > 640 forks where vmspace is shared > 37 kernel map entries > 51141141 zeroed page hits > 2222 zeroed page misses > 0 number of times the pagedaemon woke up > 0 revolutions of the clock hand > 0 pages freed by pagedaemon > 0 pages scanned by pagedaemon > 0 pages reactivated by pagedaemon > 0 busy pages found by pagedaemon > 14089163 total name lookups > cache hits (90% pos + 9% neg) system 0% per-directory > deletions 0%, falsehits 0%, toolong 0% > 0 select collisions > interrupt total rate > irq0/clock 26133597 798 > irq0/ipi 1582624 48 > irq144/acpi0 2 0 > irq113/em0 21289414 650 > irq114/em1 232449860 7106 > irq116/ix1 220960739 6754 > irq101/ehci0 51 0 > irq104/ehci1 55 0 > irq105/ahci0 25073 0 > Total 502441415 15360 > > > [LIVE]root@ar1300:~# vmstat > procs memory page disks traps cpu > r b w avm fre flt re pi po fr sr sd0 sd1 int sys cs us > sy id > 1 1 0 54960 64032024 3662 0 0 0 0 0 0 0 14513 9707 15606 > 0 9 91 > > > [LIVE]root@ar1300:~# vmstat > procs memory page disks traps cpu > r b w avm fre flt re pi po fr sr sd0 sd1 int sys cs us > sy id > 3 2 0 53392 63829852 3601 0 0 0 0 0 0 0 1903 2971 2430 0 > 10 90 > > > > > [LIVE]root@ar1300:~# sysctl net.inet.ip.ifq.maxlen > net.inet.ip.ifq.maxlen=2048 > [LIVE]root@ar1300:~# sysctl net.inet.ip.ifq.len > net.inet.ip.ifq.len=0 > [LIVE]root@ar1300:~# sysctl net.inet.ip.ifq.drops > net.inet.ip.ifq.drops=66419 > > > > > Regards, > > *Kevin Gee* > >

