Hi all,

We have an oldish (2013) but well-spec'd pair of servers  (active-backup) ,
running OpenBSD 6.0 and PF.
The only difference between the server hardware is that the primary has two
physical processors, the secondary has one.

This primary firewall is worked pretty hard (see pfctl -si below) and of
late it seems to struggle when load increases.
If we fail over to the secondary we see jitter/dropped packets on priority
traffic.
If we reload PF we often see jitter (testing with world-ping) and drops of
icmp (which has prio 7) which seem to settle if we reload PF again.
If I make a change the queue config, I need to reboot.

Problems seem to coincide with spikes in congestion, congestion is usually
approx 0.7/s if it rises much above 1 we see problems.

Most of the CPU cores aren't used much, two of the 8 cores average about
40%, one went up to 75% when I had problem with a ruleset.

I am trying to get hard figures rather than a 'feeling'.  Stats that seem
high are when there are problems (see vmstat output at the bottom, when
things are relatively quiet , context switching and interrupts are <3000).

context-switching > 15,000
interrupts >14,000
Searches > 500,000
net.inet.ip.ifq.len is usually < 100 (I've seen it at >700 briefly). This
seems to suggest that changing  net.inet.ip.ifq.maxlen may not make a
difference.

FWIW the ruleset as loaded is around 1300 lines when displayed with pfctl
-vvsr

I am looking for ways to optimize performance and would appreciate any
suggestions as to what to try and what stats to look at.
The alternative is to buy new hardware, but need to be convinced a faster
processor will make a big difference.

1) I am thinking of trying higher values of net.inet.ip.ifq.maxlen,
currently 2048. I tried 2500, didn't see much difference but suspect I can
go quite a bit higher. Does this setting require a reboot and am I right in
thinking this may help congestion, lower interrupts and context-switching?
2) Memory use is low according to collectd/snmp graphs , we have plenty can
we utilise it more?
3) Is an upgrade to OpenBSD 6.1 likely to make a significant difference?
4) We log all dropped traffic to pflog0, will disk I/O be a problem?

Sorry for vagueness, thanks in advance.
Kevin.



Possibly useful output and spec below:

Hardware:

2 x Quad Core Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz, 3600.54 MH
OpenBSD 6.0 GENERIC.MP#2 amd64


NICs
Inside type ix 10Gbps, e,g,   ix1 at pci5 dev 0 function 1 "Intel 82599"
rev 0x01
Outside and pfsync type em 1Gbps e.g. em1 at pci2 dev 0 function 1 "Intel
I350" rev 0x01

Of the 8 cores, two average about 40% utilisation.  One of them peaked at
about 75% when struggling.
Memory =64Gbps


[LIVE]root@ar1300:~# pfctl -si
Status: Enabled for 0 days 09:04:42              Debug: err

State Table                          Total             Rate
  current entries                  1205635
  searches                     16678281544 <(667)%20828-1544>
510320.1/s
  inserts                        157481830         4818.6/s
  removals                       156276195         4781.7/s
Counters
  match                          149125447         4562.9/s
  bad-offset                             0            0.0/s
  fragment                               0            0.0/s
  short                               3395            0.1/s
  normalize                            296            0.0/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                         14523            0.4/s
  ip-option                              0            0.0/s
  proto-cksum                            0            0.0/s
  state-mismatch                    103949            3.2/s
  state-insert                       10397            0.3/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s
  translate                              0            0.0/s
  no-route                               0            0.0/s

[LIVE]root@ar1300:~# vmstat -si
       4096 bytes per page
   16257397 pages managed
   16007891 pages free
      13720 pages active
       4146 pages inactive
          0 pages being paged out
         16 pages wired
    2000987 pages zeroed
          4 pages reserved for pagedaemon
          6 pages reserved for kernel
   16830030 swap pages
          0 swap pages in use
          0 total anon's in system
          0 free anon's
  119821710 page faults
  119081179 traps
  474725184 interrupts
  510456927 cpu context switches
     255355 fpu context switches
    3224063 software interrupts
  317594717 syscalls
          0 pagein operations
     329258 forks
        640 forks where vmspace is shared
         37 kernel map entries
   51141141 zeroed page hits
       2222 zeroed page misses
          0 number of times the pagedaemon woke up
          0 revolutions of the clock hand
          0 pages freed by pagedaemon
          0 pages scanned by pagedaemon
          0 pages reactivated by pagedaemon
          0 busy pages found by pagedaemon
   14089163 total name lookups
            cache hits (90% pos + 9% neg) system 0% per-directory
            deletions 0%, falsehits 0%, toolong 0%
          0 select collisions
interrupt                       total     rate
irq0/clock                   26133597      798
irq0/ipi                      1582624       48
irq144/acpi0                        2        0
irq113/em0                   21289414      650
irq114/em1                  232449860     7106
irq116/ix1                  220960739     6754
irq101/ehci0                       51        0
irq104/ehci1                       55        0
irq105/ahci0                    25073        0
Total                       502441415    15360


[LIVE]root@ar1300:~# vmstat
 procs    memory       page                    disks    traps          cpu
 r b w    avm     fre  flt  re  pi  po  fr  sr sd0 sd1  int   sys   cs us
sy id
 1 1 0  54960 64032024 3662   0   0   0   0   0   0   0 14513  9707 15606
 0  9 91


[LIVE]root@ar1300:~# vmstat
 procs    memory       page                    disks    traps          cpu
 r b w    avm     fre  flt  re  pi  po  fr  sr sd0 sd1  int   sys   cs us
sy id
 3 2 0  53392 63829852 3601   0   0   0   0   0   0   0 1903  2971 2430  0
10 90




[LIVE]root@ar1300:~# sysctl net.inet.ip.ifq.maxlen
net.inet.ip.ifq.maxlen=2048
[LIVE]root@ar1300:~# sysctl net.inet.ip.ifq.len
net.inet.ip.ifq.len=0
[LIVE]root@ar1300:~# sysctl net.inet.ip.ifq.drops
net.inet.ip.ifq.drops=66419




Regards,

*Kevin Gee*

Reply via email to