Hi all,

This is my first post to this list. I'm trying to understand why our 
OpenBSD PF router is not able to cope correctly with needed gigabit 
speeds....

I have two Dell 1750 single-Xeon 2.8GHz. The first is our production 
router still under OpenBSD 3.4 beta with PF since 2 years, and the 
second one is a fresh OpenBSD 3.7 under Generic stock kernel. The 
ultimate goal beeing to build a CARP dual router with the 2 machines.

The problem is that none of the 2 machines is able to route at speed 
higher than ~350mbit/s, even without PF which could slow things, what I 
doubt of.

In order to validate the capacity of the server to cope with 
simultaneous up/down gigabitstreams, I've done several tests

- First, validate the external test machine and the network.
Here is a simultaneous (-d) iperf TCP test between 2 Sun V40Z (SLES9 
with Broadcom 5703). Between them, there's a HP Procurve 2824 Gigabit 
switch with full-duplex enabled and properly negotiated on all ports :
ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <Linux iperf server 
address> -d -w 256k
../..
[  4]  0.0-10.0 sec  1.01 GBytes    864 Mbits/sec
[  5]  0.0-10.0 sec  1.01 GBytes    865 Mbits/sec
=> The network AND the V40Z are capable of symetric quasi full-duplex 
gigabit. OK

- This being said, I'll try to do the same thing between a V40Z and a 
DELL 1750 (OpenBSD 3.7 with Broadcom 5704)
First lets do a non-simultaneous (-r) TCP test between the V40Z and a 1750
ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <OpenBSD iperf server 
address> -r -w 256k
../..
[  4]  0.0-10.0 sec  1.09 GBytes    935 Mbits/sec
[  4]  0.0-10.0 sec  1.09 GBytes    938 Mbits/sec
=> More than 1GB are transfered in 10s in one way then in the other. 
Unidirectionnal bandwidth of 1Gbits/s is almost respected, no problem.

- Now lets try simultaneously (-d) between the V40Z and the DELL 1750 
like the first iperf test between the 2 linux boxes :
ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <OpenBSD iperf server 
address> -d -w 256k
../..
[  4]  0.0-10.0 sec    403 MBytes    338 Mbits/sec
[  5]  0.0-10.0 sec  1.02 GBytes    876 Mbits/sec
=> The Openbsd box isn't able to receive more the ~330Mbits/s every time 
I tried when it's at the same time speaking through the wire. It's a 
constant comportment.

- Seeing this disorder, let's try UDP transfers in order to determine 
the speed at which the problem begins. I set the Linux client to send 
and receive ` 46Mbits/s :
ROOT:Linux: > ./iperf -i 1 -w 256k -c <OpenBSD iperf server address> -b 
46M -d -u

On the OpenBSD we can see this :
ROOT:OpenBSD: > ./iperf -i 1 -s -u -w 256k
../..
[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total 
Datagrams
[  5]  0.0-10.0 sec  54.5 MBytes  45.7 Mbits/sec  0.264 ms  372/39217 
(0.95%)
[  7]  0.0-10.0 sec  55.0 MBytes  46.1 Mbits/sec  0.002 ms    0/39217 (0%)
=> We begin to lose inbound packets on the Openbsd as soon as 46MBits/s 
while still sending outbound packets without problems.

Of course it's only a beginning, because that's what we have with a 
stream of 800MBits/s :
ROOT:ob35bckp:/root/compile/iperf-1.7.0 > ./iperf -i 1 -s -u -w 256k
../..[ ID] Interval       Transfer     Bandwidth       Jitter   
Lost/Total Datagrams
[  7]  0.0-10.0 sec   976 MBytes   819 Mbits/sec  0.013 ms    0/696200 (0%)
[  5]  0.0-10.3 sec  79.4 MBytes  65.0 Mbits/sec  14.982 ms 
657633/714260 (92%)
Now it's dramatic and the loss of packets is 92% !....

I guess it's a problem caused by unavailable buffers for the network 
card. It's receiving network packets but has no place to put them...

There are a few elements to help thinking :

During the iperf test, here's what netstat is saying :
ROOT:OpenBSD: > netstat -m
1263 mbufs in use:
       1142 mbufs allocated to data
       3 mbufs allocated to packet headers
       118 mbufs allocated to socket names and addresses
627/670/6144 mbuf clusters in use (current/peak/max)
1676 Kbytes allocated to network (93% in use)

And now, here is what it's saying while idle :
ROOT:OpenBSD: >netstat -m
1033 mbufs in use:
       1027 mbufs allocated to data
       3 mbufs allocated to packet headers
       3 mbufs allocated to socket names and addresses
512/724/6144 mbuf clusters in use (current/peak/max)
1784 Kbytes allocated to network (71% in use)

The OpenBSD has plenty enough of memory to deal with thousands of PF 
states since many years as you can see on the dmesg of the 3.7 one :
OpenBSD 3.7 (GENERIC) #50: Sun Mar 20 00:01:57 MST 2005
   [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Xeon(TM) CPU 2.80GHz ("GenuineIntel" 686-class) 2.79 GHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID
 

real mem  = 1073160192 (1048008K)
avail mem = 972750848 (949952K)
using 4278 buffers containing 53760000 bytes (52500K) of memory

Buffers for tcp and udp aren't stock any more:
ROOT:OpenBSD: > sysctl -a |grep space                 
net.inet.tcp.recvspace=131072
net.inet.tcp.sendspace=131072
net.inet.udp.recvspace=131072
net.inet.udp.sendspace=131072

NBMCLUSTERS has disappeared, but what I guess beeing its successor 
(kern.maxclusters ?) is not able to do something good while increased.

According to Henning Brauer, I tried to change IFQ_MAXLEN in 
sys/net/if.c. Unfortunately it didn't fix the problem. With the stable 
and current kernel sources I tried many values for IFQ_MAXLEN up to 2000 
without success.
I still have drops as low as 46Mbits/s with UDP and accordingly the drop 
counter from netstat -s increases :
       810148 datagrams received
       710297 dropped due to full socket buffers

Just another test in order to exclude the comportment of the bge whether 
we use jumbo or not : I have 2 interfaces on the Dell : bge0 (no IP but 
MTU 8000 to permit 1500 bytes frames on the vlan interface) on which I 
have 1 vlan interface and bge1 natively (untagged) configured on another 
lan. Whatever the interface the drops are the same...

bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 8000
        address: 00:0d:56:fd:58:cd
        media: Ethernet 1000baseT full-duplex
        status: active
        inet6 fe80::20d:56ff:fefd:58cd%bge0 prefixlen 64 scopeid 0x1
vlan10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        address: 00:0d:56:fd:58:cd
        vlan: 10 parent interface: bge0
        inet6 fe80::20d:56ff:fefd:58cd%vlan10 prefixlen 64 scopeid 0x7
        inet 10.10.0.254 netmask 0xffff0000 broadcast 10.10.255.255

bge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        address: 00:0d:56:fd:58:ce
        media: Ethernet autoselect (1000baseT full-duplex)
        status: active
        inet 10.1.0.254 netmask 0xffff0000 broadcast 10.1.255.255
        inet6 fe80::20d:56ff:fefd:58ce%bge1 prefixlen 64 scopeid 0x2

The fact is the card has no problem to receive a gigabit stream as soos 
as it isn't speaking at the same time...


That said, I really don't know what to do any more......

Has any of you already encountered and solved this problem before ? I'm 
pretty sure I'm not the only one to have this.....

Thanks in advance !!

Frederic

-- 
__________________________________________________
Frederic BRET - Universite de La Rochelle 
Centre de Ressources Informatiques
Technoforum - Avenue Einstein     Tel : 0546458214
17042 La Rochelle Cedex - France  Fax : 0546458245
__________________________________________________

Reply via email to