Hi

just a few updates about our problem. I tried a different setup this time : no iperf on the OpenBSD router, only ip forwarding :

I have 1 linux (kernel 2.6) as a load generator on each of 2 vlans routed by the OpenBSD, no process on it except a top to monitor interrupts and load level.

I modified iperf to generate very very small packets : I want this time to determine the bottleneck on openBSD.

All I can do between the 2 linuxes and before an interrupt load of 100% on the router is about 140Kpps (-> 5MB/s only) :
[  4]  1.0- 2.0 sec  5.45 MBytes  45.7 Mbits/sec  0.009 ms    0/142862 (0%)
[  4]  2.0- 3.0 sec  5.45 MBytes  45.7 Mbits/sec  0.010 ms    0/142860 (0%)
[  4]  3.0- 4.0 sec  5.45 MBytes  45.7 Mbits/sec  0.009 ms    0/142841 (0%)
=> more packets/s from the sender would mean dropped packet on the router and loss on the receiver.

And to demonstrate the maximum capabilities of the load generators, the same mono-thread iperf test between the 2 linuxes on the same vlan (without router) :
[  5]  1.0- 2.0 sec  12.6 MBytes    105 Mbits/sec  0.003 ms    0/329213 (0%)
[  5]  2.0- 3.0 sec  12.6 MBytes    106 Mbits/sec  0.003 ms    0/329980 (0%)
[  5]  3.0- 4.0 sec  12.6 MBytes    106 Mbits/sec  0.003 ms    0/330488 (0%)
330kpps without loss and 1 busy CPU on the linux. But still more than twice the packet rate through the router.

(And just for the fun, despite the announce of the capability of FreeBSD to route 1Mpps, our 5.3 on bi-opteron is only able to route ~140Kpps too)


So the conclusion may be that the BSD hardwares are limited by the ability of their OS to manage interrupts properly...
What do you think about this  ?

Frederic

Frederic BRET wrote:

Hi all,

This is my first post to this list. I'm trying to understand why our OpenBSD PF router is not able to cope correctly with needed gigabit speeds....

I have two Dell 1750 single-Xeon 2.8GHz. The first is our production router still under OpenBSD 3.4 beta with PF since 2 years, and the second one is a fresh OpenBSD 3.7 under Generic stock kernel. The ultimate goal beeing to build a CARP dual router with the 2 machines.

The problem is that none of the 2 machines is able to route at speed higher than ~350mbit/s, even without PF which could slow things, what I doubt of.

In order to validate the capacity of the server to cope with simultaneous up/down gigabitstreams, I've done several tests

- First, validate the external test machine and the network.
Here is a simultaneous (-d) iperf TCP test between 2 Sun V40Z (SLES9 with Broadcom 5703). Between them, there's a HP Procurve 2824 Gigabit switch with full-duplex enabled and properly negotiated on all ports : ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <Linux iperf server address> -d -w 256k
../..
[  4]  0.0-10.0 sec  1.01 GBytes    864 Mbits/sec
[  5]  0.0-10.0 sec  1.01 GBytes    865 Mbits/sec
=> The network AND the V40Z are capable of symetric quasi full-duplex gigabit. OK

- This being said, I'll try to do the same thing between a V40Z and a DELL 1750 (OpenBSD 3.7 with Broadcom 5704)
First lets do a non-simultaneous (-r) TCP test between the V40Z and a 1750
ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <OpenBSD iperf server address> -r -w 256k
../..
[  4]  0.0-10.0 sec  1.09 GBytes    935 Mbits/sec
[  4]  0.0-10.0 sec  1.09 GBytes    938 Mbits/sec
=> More than 1GB are transfered in 10s in one way then in the other. Unidirectionnal bandwidth of 1Gbits/s is almost respected, no problem.

- Now lets try simultaneously (-d) between the V40Z and the DELL 1750 like the first iperf test between the 2 linux boxes : ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <OpenBSD iperf server address> -d -w 256k
../..
[  4]  0.0-10.0 sec    403 MBytes    338 Mbits/sec
[  5]  0.0-10.0 sec  1.02 GBytes    876 Mbits/sec
=> The Openbsd box isn't able to receive more the ~330Mbits/s every time I tried when it's at the same time speaking through the wire. It's a constant comportment.

- Seeing this disorder, let's try UDP transfers in order to determine the speed at which the problem begins. I set the Linux client to send and receive ` 46Mbits/s : ROOT:Linux: > ./iperf -i 1 -w 256k -c <OpenBSD iperf server address> -b 46M -d -u

On the OpenBSD we can see this :
ROOT:OpenBSD: > ./iperf -i 1 -s -u -w 256k
../..
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 5] 0.0-10.0 sec 54.5 MBytes 45.7 Mbits/sec 0.264 ms 372/39217 (0.95%)
[  7]  0.0-10.0 sec  55.0 MBytes  46.1 Mbits/sec  0.002 ms    0/39217 (0%)
=> We begin to lose inbound packets on the Openbsd as soon as 46MBits/s while still sending outbound packets without problems.

Of course it's only a beginning, because that's what we have with a stream of 800MBits/s :
ROOT:ob35bckp:/root/compile/iperf-1.7.0 > ./iperf -i 1 -s -u -w 256k
../..[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[  7]  0.0-10.0 sec   976 MBytes   819 Mbits/sec  0.013 ms    0/696200 (0%)
[ 5] 0.0-10.3 sec 79.4 MBytes 65.0 Mbits/sec 14.982 ms 657633/714260 (92%)
Now it's dramatic and the loss of packets is 92% !....

I guess it's a problem caused by unavailable buffers for the network card. It's receiving network packets but has no place to put them...

There are a few elements to help thinking :

During the iperf test, here's what netstat is saying :
ROOT:OpenBSD: > netstat -m
1263 mbufs in use:
      1142 mbufs allocated to data
      3 mbufs allocated to packet headers
      118 mbufs allocated to socket names and addresses
627/670/6144 mbuf clusters in use (current/peak/max)
1676 Kbytes allocated to network (93% in use)

And now, here is what it's saying while idle :
ROOT:OpenBSD: >netstat -m
1033 mbufs in use:
      1027 mbufs allocated to data
      3 mbufs allocated to packet headers
      3 mbufs allocated to socket names and addresses
512/724/6144 mbuf clusters in use (current/peak/max)
1784 Kbytes allocated to network (71% in use)

The OpenBSD has plenty enough of memory to deal with thousands of PF states since many years as you can see on the dmesg of the 3.7 one :
OpenBSD 3.7 (GENERIC) #50: Sun Mar 20 00:01:57 MST 2005
  [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Xeon(TM) CPU 2.80GHz ("GenuineIntel" 686-class) 2.79 GHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID
real mem  = 1073160192 (1048008K)
avail mem = 972750848 (949952K)
using 4278 buffers containing 53760000 bytes (52500K) of memory

Buffers for tcp and udp aren't stock any more:
ROOT:OpenBSD: > sysctl -a |grep space net.inet.tcp.recvspace=131072
net.inet.tcp.sendspace=131072
net.inet.udp.recvspace=131072
net.inet.udp.sendspace=131072

NBMCLUSTERS has disappeared, but what I guess beeing its successor (kern.maxclusters ?) is not able to do something good while increased.

According to Henning Brauer, I tried to change IFQ_MAXLEN in sys/net/if.c. Unfortunately it didn't fix the problem. With the stable and current kernel sources I tried many values for IFQ_MAXLEN up to 2000 without success. I still have drops as low as 46Mbits/s with UDP and accordingly the drop counter from netstat -s increases :
      810148 datagrams received
      710297 dropped due to full socket buffers

Just another test in order to exclude the comportment of the bge whether we use jumbo or not : I have 2 interfaces on the Dell : bge0 (no IP but MTU 8000 to permit 1500 bytes frames on the vlan interface) on which I have 1 vlan interface and bge1 natively (untagged) configured on another lan. Whatever the interface the drops are the same...

bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 8000
       address: 00:0d:56:fd:58:cd
       media: Ethernet 1000baseT full-duplex
       status: active
       inet6 fe80::20d:56ff:fefd:58cd%bge0 prefixlen 64 scopeid 0x1
vlan10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
       address: 00:0d:56:fd:58:cd
       vlan: 10 parent interface: bge0
       inet6 fe80::20d:56ff:fefd:58cd%vlan10 prefixlen 64 scopeid 0x7
       inet 10.10.0.254 netmask 0xffff0000 broadcast 10.10.255.255

bge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
       address: 00:0d:56:fd:58:ce
       media: Ethernet autoselect (1000baseT full-duplex)
       status: active
       inet 10.1.0.254 netmask 0xffff0000 broadcast 10.1.255.255
       inet6 fe80::20d:56ff:fefd:58ce%bge1 prefixlen 64 scopeid 0x2

The fact is the card has no problem to receive a gigabit stream as soos as it isn't speaking at the same time...


That said, I really don't know what to do any more......

Has any of you already encountered and solved this problem before ? I'm pretty sure I'm not the only one to have this.....

Thanks in advance !!

Frederic



--
__________________________________________________
Frederic BRET - Universite de La Rochelle Centre de Ressources Informatiques
Technoforum - Avenue Einstein     Tel : 0546458214
17042 La Rochelle Cedex - France  Fax : 0546458245
__________________________________________________

Reply via email to