Hi All,
Following on from my previous post.
I needed to reload our PF ruleset today (pfctl -vvsr show about 1270 rules
total FWIW).

As soon as I ran 'pfctl -f /etc/pf.conf' our external measurements showed a
lot of jitter through the firewall, previous to that jitter had been
minimal for hours - nice flat graphs of ping responses :)
Looking at our collectd graphs for this period, congestion went from
minimal to hovering around 60-80 packets/s and state searches leapt from
averaging about 400K/s to around 1.2M/s .

​Sometimes reloading the PF ruleset again cures this, so I wondered if it
is related to optimization (which we have not set, so it is at default
value)  but in this case I had to reboot the firewall.
​​Based on this does anyone have any idea where to look to identify the
problem. Basically, reloading pf.conf seems to be what induces performance
problems and a reboot seems to be the most certain cure.

Regards,

*Kevin Gee*

Find out what it’s like to work at Brandwatch
<http://www.google.com/url?q=http://www.glassdoor.co.uk/Reviews/Brandwatch-Reviews-E716341.htm&sa=D&sntz=1&usg=AFQjCNF1BY-dSLLjdJg5K2Ukvf-0d0Juuw>

On 31 May 2017 at 11:10, Kevin Gee <[email protected]> wrote:

> Hi all,
>
> We have an oldish (2013) but well-spec'd pair of servers  (active-backup)
> , running OpenBSD 6.0 and PF.
> The only difference between the server hardware is that the primary has
> two physical processors, the secondary has one.
>
> This primary firewall is worked pretty hard (see pfctl -si below) and of
> late it seems to struggle when load increases.
> If we fail over to the secondary we see jitter/dropped packets on priority
> traffic.
> If we reload PF we often see jitter (testing with world-ping) and drops of
> icmp (which has prio 7) which seem to settle if we reload PF again.
> If I make a change the queue config, I need to reboot.
>
> Problems seem to coincide with spikes in congestion, congestion is usually
> approx 0.7/s if it rises much above 1 we see problems.
>
> Most of the CPU cores aren't used much, two of the 8 cores average about
> 40%, one went up to 75% when I had problem with a ruleset.
>
> I am trying to get hard figures rather than a 'feeling'.  Stats that seem
> high are when there are problems (see vmstat output at the bottom, when
> things are relatively quiet , context switching and interrupts are <3000).
>
> context-switching > 15,000
> interrupts >14,000
> Searches > 500,000
> net.inet.ip.ifq.len is usually < 100 (I've seen it at >700 briefly). This
> seems to suggest that changing  net.inet.ip.ifq.maxlen may not make a
> difference.
>
> FWIW the ruleset as loaded is around 1300 lines when displayed with pfctl
> -vvsr
>
> I am looking for ways to optimize performance and would appreciate any
> suggestions as to what to try and what stats to look at.
> The alternative is to buy new hardware, but need to be convinced a faster
> processor will make a big difference.
>
> 1) I am thinking of trying higher values of net.inet.ip.ifq.maxlen,
> currently 2048. I tried 2500, didn't see much difference but suspect I can
> go quite a bit higher. Does this setting require a reboot and am I right in
> thinking this may help congestion, lower interrupts and context-switching?
> 2) Memory use is low according to collectd/snmp graphs , we have plenty
> can we utilise it more?
> 3) Is an upgrade to OpenBSD 6.1 likely to make a significant difference?
> 4) We log all dropped traffic to pflog0, will disk I/O be a problem?
>
> Sorry for vagueness, thanks in advance.
> Kevin.
>
>
>
> Possibly useful output and spec below:
>
> Hardware:
>
> 2 x Quad Core Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz, 3600.54 MH
> OpenBSD 6.0 GENERIC.MP#2 amd64
>
>
> NICs
> Inside type ix 10Gbps, e,g,   ix1 at pci5 dev 0 function 1 "Intel 82599"
> rev 0x01
> Outside and pfsync type em 1Gbps e.g. em1 at pci2 dev 0 function 1 "Intel
> I350" rev 0x01
>
> Of the 8 cores, two average about 40% utilisation.  One of them peaked at
> about 75% when struggling.
> Memory =64Gbps
>
>
> [LIVE]root@ar1300:~# pfctl -si
> Status: Enabled for 0 days 09:04:42              Debug: err
>
> State Table                          Total             Rate
>   current entries                  1205635
>   searches                     16678281544 <(667)%20828-1544>
> 510320.1/s
>   inserts                        157481830         4818.6/s
>   removals                       156276195         4781.7/s
> Counters
>   match                          149125447         4562.9/s
>   bad-offset                             0            0.0/s
>   fragment                               0            0.0/s
>   short                               3395            0.1/s
>   normalize                            296            0.0/s
>   memory                                 0            0.0/s
>   bad-timestamp                          0            0.0/s
>   congestion                         14523            0.4/s
>   ip-option                              0            0.0/s
>   proto-cksum                            0            0.0/s
>   state-mismatch                    103949            3.2/s
>   state-insert                       10397            0.3/s
>   state-limit                            0            0.0/s
>   src-limit                              0            0.0/s
>   synproxy                               0            0.0/s
>   translate                              0            0.0/s
>   no-route                               0            0.0/s
>
> [LIVE]root@ar1300:~# vmstat -si
>        4096 bytes per page
>    16257397 pages managed
>    16007891 pages free
>       13720 pages active
>        4146 pages inactive
>           0 pages being paged out
>          16 pages wired
>     2000987 pages zeroed
>           4 pages reserved for pagedaemon
>           6 pages reserved for kernel
>    16830030 swap pages
>           0 swap pages in use
>           0 total anon's in system
>           0 free anon's
>   119821710 page faults
>   119081179 traps
>   474725184 interrupts
>   510456927 cpu context switches
>      255355 fpu context switches
>     3224063 software interrupts
>   317594717 syscalls
>           0 pagein operations
>      329258 forks
>         640 forks where vmspace is shared
>          37 kernel map entries
>    51141141 zeroed page hits
>        2222 zeroed page misses
>           0 number of times the pagedaemon woke up
>           0 revolutions of the clock hand
>           0 pages freed by pagedaemon
>           0 pages scanned by pagedaemon
>           0 pages reactivated by pagedaemon
>           0 busy pages found by pagedaemon
>    14089163 total name lookups
>             cache hits (90% pos + 9% neg) system 0% per-directory
>             deletions 0%, falsehits 0%, toolong 0%
>           0 select collisions
> interrupt                       total     rate
> irq0/clock                   26133597      798
> irq0/ipi                      1582624       48
> irq144/acpi0                        2        0
> irq113/em0                   21289414      650
> irq114/em1                  232449860     7106
> irq116/ix1                  220960739     6754
> irq101/ehci0                       51        0
> irq104/ehci1                       55        0
> irq105/ahci0                    25073        0
> Total                       502441415    15360
>
>
> [LIVE]root@ar1300:~# vmstat
>  procs    memory       page                    disks    traps          cpu
>  r b w    avm     fre  flt  re  pi  po  fr  sr sd0 sd1  int   sys   cs us
> sy id
>  1 1 0  54960 64032024 3662   0   0   0   0   0   0   0 14513  9707 15606
>  0  9 91
>
>
> [LIVE]root@ar1300:~# vmstat
>  procs    memory       page                    disks    traps          cpu
>  r b w    avm     fre  flt  re  pi  po  fr  sr sd0 sd1  int   sys   cs us
> sy id
>  3 2 0  53392 63829852 3601   0   0   0   0   0   0   0 1903  2971 2430  0
> 10 90
>
>
>
>
> [LIVE]root@ar1300:~# sysctl net.inet.ip.ifq.maxlen
> net.inet.ip.ifq.maxlen=2048
> [LIVE]root@ar1300:~# sysctl net.inet.ip.ifq.len
> net.inet.ip.ifq.len=0
> [LIVE]root@ar1300:~# sysctl net.inet.ip.ifq.drops
> net.inet.ip.ifq.drops=66419
>
>
>
>
> Regards,
>
> *Kevin Gee*
>
>

Reply via email to