Hi Henning,

Thank you for your reply. After looking back through our config's I 
have removed the cal***** changes.

The things I set (and have now removed) from that site were;
# Custom Speed Tweaks
kern.bufcachepercent=75        # Allow the kernel to use up to 90% of 
the RAM for cache (default 10%)
net.inet.ip.ifq.maxlen=1536    # Maximum allowed input queue length 
(256*number of physical interfaces)
net.inet.udp.recvspace=131072  # Increase UDP "receive" buffer size. 
Good for 200Mbit without packet drop.
net.inet.udp.sendspace=131072  # Increase UDP "send" buffer size. Good 
for 200Mbit without packet drop.
net.inet.tcp.mssdflt=1460   # Set the default MSS (MTU=1500)
net.inet.tcp.rfc3390=1     # RFC3390 increasing TCP's Initial 
Congestion Window to 14600 for SPDY

Removing these changes made no difference to the performance.

I read 'The Book of PF' when I was first learning OBSD and how to write 
PF, HFSC etc etc and it all works beautifully.
And I have also read the attached ps file 'tuning-openbsd.ps', and this 
page 
http://www.pantz.org/software/openbsd/runningandtunningopenbsd.html to 
name only a few of the sources I have read over the years (I know these 
references are very old now and not necessarily accurate).

I have checked all the usual things to make sure that I have enough 
mbuffs and tables sizes etc etc and all seems well and I am not running 
out of any other resources.
A look at all the pages from systat and top etc shows that PF barely 
registers a CPU percentage, while the interrupts on CPU0 stick to 100% 
when throughput goes over ~750MBits. The performance ceiling seems to 
correlate with CPU0's utilisation.

I appreciate that you may be frustrated by the existence of bad advice 
on the internet. And as someone who is continually learning and only 
wants to do things right, could you instead of saying that he's an 
idiot who knows nothing, please provide some constructive examples of 
what sort of things cal**** have got wrong so we can all learn?
I cannot see anything that stands out as bad advice but I appreciate 
their must be otherwise you wouldn't say that.

I was asking about the ToE offloading etc in the hope that it might 
help a little bit to bring our interrupt CPU utilisation down, without 
better knowledge of the OBSD net stack internals. I changed the network 
card from an old legacy interrupt style card to a new Intel ET2 which 
uses the MSI (message signalled interrupts) style, but this made no 
improvement to the maximum throughput.

Regarding the missed step, I don't know which diagnostics/stats to 
provide here in the hope of some help. What would be most useful?
Is there a way of seeing what the interrupts are doing in more detail? 
systat shows I'm currently running on average 24k interrupts overall 
for 85% interrupt utilisation (~500Mbit).

Someone did previously (and very helpfully) indicate that the 
~400,000pps we are getting on our HP DL160 G6's is pretty good. Because 
I like OBSD so much I have managed to convince my manager to invest in 
faster hardware with the fastest single CPU speeds I can get my hands 
on, but I believe this is a poor approach to the problem (for the long 
term anyway).

NB; This is all based on our traffic profile, which is not the same as 
others (the traffic we generate is the result of running ~40 servers 
behind the OBSD firewalls which scrape and crawl the internet (we are 
an internet social media search engine)).

systat pf (currently only shifting around 500Mbits);

            TYPE NAME                          VALUE       RATE NOTES
              pf Status                      Enabled
              pf Since                     914:53:16
              pf Debug                           err
              pf Hostid                   0x7cee5e20

           state Count                        616822
           state searches                 633323382K  196904.28
           state inserts                   19859725K    6174.52
           state removals                  19859123K    6174.33

       src track Count                             0
       src track searches                          0       0.00
       src track inserts                           0       0.00
       src track removals                          0       0.00

         counter match                     19986626K    6213.97
         counter bad-offset                        0       0.00
         counter fragment                     193784       0.06
         counter short                          4606       0.00
         counter normalize                    243051       0.07
         counter memory                            0       0.00
         counter bad-timestamp                     0       0.00
         counter congestion                178267231      54.13
         counter ip-option                    567580       0.17
         counter proto-cksum                       0       0.00
         counter state-mismatch             43494091      13.21
         counter state-insert                      0       0.00
         counter state-limit                       0       0.00
         counter src-limit                         0       0.00
         counter synproxy                          0       0.00

   limit counter max states per rule               0       0.00
   limit counter max-src-states                    0       0.00
   limit counter max-src-nodes                     0       0.00
   limit counter max-src-conn                      0       0.00
   limit counter max-src-conn-rate                 0       0.00
   limit counter overload table insertion          0       0.00
   limit counter overload flush states             0       0.00

Thanks for your time and reading this far :)
Kind regards, Andrew Lemin


On Wed 26 Jun 2013 11:32:18 BST, Henning Brauer wrote:
> * andy <[email protected]> [2013-05-15 11:31]:
>> I run 12 OpenBSD firewalls, and I have an issue on my highest throughput
>> boxes. I have HP DL160 G6 boxes with Intel ET2 4 port NIC's.
>> I have a problem where I cannot run traffic any faster than ~700Mbit as I
>> am hitting 100% utilisation on the first core due to the giant big lock
>> trying to process the MSI interrupts.
>>
>> The traffic comprises of lots of small payload packets (currently running
>> around 300,000 to 400,000 pps) and I cannot run any faster.
>>
>> I have tunned the boxes as much as possible using information from
>> calomel.org etc and overall we have been extremely happy with them, expect
>> for the performance limits.
>
> congratulations, by "using information" from a random idiot (who has
> very well and often demonstrated, last not least by the articles on
> said website, that he doesn't understand a single bit of what he's
> writing about) you made your systems slower.
>
>
>> I understand the devs want to keep the network stack in-house as their are
>> many network cards that simply screw things up, and it is this approach
>> which has given OBSD the stability and security reputation it has. But this
>> approach with the giant big lock limit imposes a hard performance limit for
>> OBSD. But I do also understand that the devs realise this and as a short
>> term solution until the kernel becomes true SMP, they have started to
>> implement ToE and offloading for some NICs :D :)
>>
>> Can you please tell me when ToE support will be added for the Intel series
>> of cards? We are going to have to abandon OBSD if it cannot perform at the
>> throughputs we need but I really want to stay with OBSD? I am not a
>> developer and so cannot contribute myself to any efforts (believe me I
>> would if i could!)..
>
> now let's revisit that.
>
> 1) you have a performance problem
> 2) [ hint: you miss a step here ]
> 3) you think ToE is the solution (why?)
>
> hmpf.
>
> let me put it straight: your idea that ToE would be the answer is
> plain wrong. there is much more headroom in OpenBSD than what you are
> running, but that requires analysis, thought and probably help by an
> experienced person who really understands pf.

[demime 1.01d removed an attachment of type application/postscript which had a 
name of tuning-openbsd.ps]

Reply via email to