Hi Henning, Thank you for your reply. After looking back through our config's I have removed the cal***** changes.
The things I set (and have now removed) from that site were; # Custom Speed Tweaks kern.bufcachepercent=75 # Allow the kernel to use up to 90% of the RAM for cache (default 10%) net.inet.ip.ifq.maxlen=1536 # Maximum allowed input queue length (256*number of physical interfaces) net.inet.udp.recvspace=131072 # Increase UDP "receive" buffer size. Good for 200Mbit without packet drop. net.inet.udp.sendspace=131072 # Increase UDP "send" buffer size. Good for 200Mbit without packet drop. net.inet.tcp.mssdflt=1460 # Set the default MSS (MTU=1500) net.inet.tcp.rfc3390=1 # RFC3390 increasing TCP's Initial Congestion Window to 14600 for SPDY Removing these changes made no difference to the performance. I read 'The Book of PF' when I was first learning OBSD and how to write PF, HFSC etc etc and it all works beautifully. And I have also read the attached ps file 'tuning-openbsd.ps', and this page http://www.pantz.org/software/openbsd/runningandtunningopenbsd.html to name only a few of the sources I have read over the years (I know these references are very old now and not necessarily accurate). I have checked all the usual things to make sure that I have enough mbuffs and tables sizes etc etc and all seems well and I am not running out of any other resources. A look at all the pages from systat and top etc shows that PF barely registers a CPU percentage, while the interrupts on CPU0 stick to 100% when throughput goes over ~750MBits. The performance ceiling seems to correlate with CPU0's utilisation. I appreciate that you may be frustrated by the existence of bad advice on the internet. And as someone who is continually learning and only wants to do things right, could you instead of saying that he's an idiot who knows nothing, please provide some constructive examples of what sort of things cal**** have got wrong so we can all learn? I cannot see anything that stands out as bad advice but I appreciate their must be otherwise you wouldn't say that. I was asking about the ToE offloading etc in the hope that it might help a little bit to bring our interrupt CPU utilisation down, without better knowledge of the OBSD net stack internals. I changed the network card from an old legacy interrupt style card to a new Intel ET2 which uses the MSI (message signalled interrupts) style, but this made no improvement to the maximum throughput. Regarding the missed step, I don't know which diagnostics/stats to provide here in the hope of some help. What would be most useful? Is there a way of seeing what the interrupts are doing in more detail? systat shows I'm currently running on average 24k interrupts overall for 85% interrupt utilisation (~500Mbit). Someone did previously (and very helpfully) indicate that the ~400,000pps we are getting on our HP DL160 G6's is pretty good. Because I like OBSD so much I have managed to convince my manager to invest in faster hardware with the fastest single CPU speeds I can get my hands on, but I believe this is a poor approach to the problem (for the long term anyway). NB; This is all based on our traffic profile, which is not the same as others (the traffic we generate is the result of running ~40 servers behind the OBSD firewalls which scrape and crawl the internet (we are an internet social media search engine)). systat pf (currently only shifting around 500Mbits); TYPE NAME VALUE RATE NOTES pf Status Enabled pf Since 914:53:16 pf Debug err pf Hostid 0x7cee5e20 state Count 616822 state searches 633323382K 196904.28 state inserts 19859725K 6174.52 state removals 19859123K 6174.33 src track Count 0 src track searches 0 0.00 src track inserts 0 0.00 src track removals 0 0.00 counter match 19986626K 6213.97 counter bad-offset 0 0.00 counter fragment 193784 0.06 counter short 4606 0.00 counter normalize 243051 0.07 counter memory 0 0.00 counter bad-timestamp 0 0.00 counter congestion 178267231 54.13 counter ip-option 567580 0.17 counter proto-cksum 0 0.00 counter state-mismatch 43494091 13.21 counter state-insert 0 0.00 counter state-limit 0 0.00 counter src-limit 0 0.00 counter synproxy 0 0.00 limit counter max states per rule 0 0.00 limit counter max-src-states 0 0.00 limit counter max-src-nodes 0 0.00 limit counter max-src-conn 0 0.00 limit counter max-src-conn-rate 0 0.00 limit counter overload table insertion 0 0.00 limit counter overload flush states 0 0.00 Thanks for your time and reading this far :) Kind regards, Andrew Lemin On Wed 26 Jun 2013 11:32:18 BST, Henning Brauer wrote: > * andy <[email protected]> [2013-05-15 11:31]: >> I run 12 OpenBSD firewalls, and I have an issue on my highest throughput >> boxes. I have HP DL160 G6 boxes with Intel ET2 4 port NIC's. >> I have a problem where I cannot run traffic any faster than ~700Mbit as I >> am hitting 100% utilisation on the first core due to the giant big lock >> trying to process the MSI interrupts. >> >> The traffic comprises of lots of small payload packets (currently running >> around 300,000 to 400,000 pps) and I cannot run any faster. >> >> I have tunned the boxes as much as possible using information from >> calomel.org etc and overall we have been extremely happy with them, expect >> for the performance limits. > > congratulations, by "using information" from a random idiot (who has > very well and often demonstrated, last not least by the articles on > said website, that he doesn't understand a single bit of what he's > writing about) you made your systems slower. > > >> I understand the devs want to keep the network stack in-house as their are >> many network cards that simply screw things up, and it is this approach >> which has given OBSD the stability and security reputation it has. But this >> approach with the giant big lock limit imposes a hard performance limit for >> OBSD. But I do also understand that the devs realise this and as a short >> term solution until the kernel becomes true SMP, they have started to >> implement ToE and offloading for some NICs :D :) >> >> Can you please tell me when ToE support will be added for the Intel series >> of cards? We are going to have to abandon OBSD if it cannot perform at the >> throughputs we need but I really want to stay with OBSD? I am not a >> developer and so cannot contribute myself to any efforts (believe me I >> would if i could!).. > > now let's revisit that. > > 1) you have a performance problem > 2) [ hint: you miss a step here ] > 3) you think ToE is the solution (why?) > > hmpf. > > let me put it straight: your idea that ToE would be the answer is > plain wrong. there is much more headroom in OpenBSD than what you are > running, but that requires analysis, thought and probably help by an > experienced person who really understands pf. [demime 1.01d removed an attachment of type application/postscript which had a name of tuning-openbsd.ps]

