Hi all,
This is my first post to this list. I'm trying to understand why our
OpenBSD PF router is not able to cope correctly with needed gigabit
speeds....
I have two Dell 1750 single-Xeon 2.8GHz. The first is our production
router still under OpenBSD 3.4 beta with PF since 2 years, and the
second one is a fresh OpenBSD 3.7 under Generic stock kernel. The
ultimate goal beeing to build a CARP dual router with the 2 machines.
The problem is that none of the 2 machines is able to route at speed
higher than ~350mbit/s, even without PF which could slow things, what I
doubt of.
In order to validate the capacity of the server to cope with
simultaneous up/down gigabitstreams, I've done several tests
- First, validate the external test machine and the network.
Here is a simultaneous (-d) iperf TCP test between 2 Sun V40Z (SLES9
with Broadcom 5703). Between them, there's a HP Procurve 2824 Gigabit
switch with full-duplex enabled and properly negotiated on all ports :
ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <Linux iperf server
address> -d -w 256k
../..
[ 4] 0.0-10.0 sec 1.01 GBytes 864 Mbits/sec
[ 5] 0.0-10.0 sec 1.01 GBytes 865 Mbits/sec
=> The network AND the V40Z are capable of symetric quasi full-duplex
gigabit. OK
- This being said, I'll try to do the same thing between a V40Z and a
DELL 1750 (OpenBSD 3.7 with Broadcom 5704)
First lets do a non-simultaneous (-r) TCP test between the V40Z and a 1750
ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <OpenBSD iperf server
address> -r -w 256k
../..
[ 4] 0.0-10.0 sec 1.09 GBytes 935 Mbits/sec
[ 4] 0.0-10.0 sec 1.09 GBytes 938 Mbits/sec
=> More than 1GB are transfered in 10s in one way then in the other.
Unidirectionnal bandwidth of 1Gbits/s is almost respected, no problem.
- Now lets try simultaneously (-d) between the V40Z and the DELL 1750
like the first iperf test between the 2 linux boxes :
ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <OpenBSD iperf server
address> -d -w 256k
../..
[ 4] 0.0-10.0 sec 403 MBytes 338 Mbits/sec
[ 5] 0.0-10.0 sec 1.02 GBytes 876 Mbits/sec
=> The Openbsd box isn't able to receive more the ~330Mbits/s every time
I tried when it's at the same time speaking through the wire. It's a
constant comportment.
- Seeing this disorder, let's try UDP transfers in order to determine
the speed at which the problem begins. I set the Linux client to send
and receive ` 46Mbits/s :
ROOT:Linux: > ./iperf -i 1 -w 256k -c <OpenBSD iperf server address> -b
46M -d -u
On the OpenBSD we can see this :
ROOT:OpenBSD: > ./iperf -i 1 -s -u -w 256k
../..
[ ID] Interval Transfer Bandwidth Jitter Lost/Total
Datagrams
[ 5] 0.0-10.0 sec 54.5 MBytes 45.7 Mbits/sec 0.264 ms 372/39217
(0.95%)
[ 7] 0.0-10.0 sec 55.0 MBytes 46.1 Mbits/sec 0.002 ms 0/39217 (0%)
=> We begin to lose inbound packets on the Openbsd as soon as 46MBits/s
while still sending outbound packets without problems.
Of course it's only a beginning, because that's what we have with a
stream of 800MBits/s :
ROOT:ob35bckp:/root/compile/iperf-1.7.0 > ./iperf -i 1 -s -u -w 256k
../..[ ID] Interval Transfer Bandwidth Jitter
Lost/Total Datagrams
[ 7] 0.0-10.0 sec 976 MBytes 819 Mbits/sec 0.013 ms 0/696200 (0%)
[ 5] 0.0-10.3 sec 79.4 MBytes 65.0 Mbits/sec 14.982 ms
657633/714260 (92%)
Now it's dramatic and the loss of packets is 92% !....
I guess it's a problem caused by unavailable buffers for the network
card. It's receiving network packets but has no place to put them...
There are a few elements to help thinking :
During the iperf test, here's what netstat is saying :
ROOT:OpenBSD: > netstat -m
1263 mbufs in use:
1142 mbufs allocated to data
3 mbufs allocated to packet headers
118 mbufs allocated to socket names and addresses
627/670/6144 mbuf clusters in use (current/peak/max)
1676 Kbytes allocated to network (93% in use)
And now, here is what it's saying while idle :
ROOT:OpenBSD: >netstat -m
1033 mbufs in use:
1027 mbufs allocated to data
3 mbufs allocated to packet headers
3 mbufs allocated to socket names and addresses
512/724/6144 mbuf clusters in use (current/peak/max)
1784 Kbytes allocated to network (71% in use)
The OpenBSD has plenty enough of memory to deal with thousands of PF
states since many years as you can see on the dmesg of the 3.7 one :
OpenBSD 3.7 (GENERIC) #50: Sun Mar 20 00:01:57 MST 2005
[EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Xeon(TM) CPU 2.80GHz ("GenuineIntel" 686-class) 2.79 GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID
real mem = 1073160192 (1048008K)
avail mem = 972750848 (949952K)
using 4278 buffers containing 53760000 bytes (52500K) of memory
Buffers for tcp and udp aren't stock any more:
ROOT:OpenBSD: > sysctl -a |grep space
net.inet.tcp.recvspace=131072
net.inet.tcp.sendspace=131072
net.inet.udp.recvspace=131072
net.inet.udp.sendspace=131072
NBMCLUSTERS has disappeared, but what I guess beeing its successor
(kern.maxclusters ?) is not able to do something good while increased.
According to Henning Brauer, I tried to change IFQ_MAXLEN in
sys/net/if.c. Unfortunately it didn't fix the problem. With the stable
and current kernel sources I tried many values for IFQ_MAXLEN up to 2000
without success.
I still have drops as low as 46Mbits/s with UDP and accordingly the drop
counter from netstat -s increases :
810148 datagrams received
710297 dropped due to full socket buffers
Just another test in order to exclude the comportment of the bge whether
we use jumbo or not : I have 2 interfaces on the Dell : bge0 (no IP but
MTU 8000 to permit 1500 bytes frames on the vlan interface) on which I
have 1 vlan interface and bge1 natively (untagged) configured on another
lan. Whatever the interface the drops are the same...
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 8000
address: 00:0d:56:fd:58:cd
media: Ethernet 1000baseT full-duplex
status: active
inet6 fe80::20d:56ff:fefd:58cd%bge0 prefixlen 64 scopeid 0x1
vlan10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
address: 00:0d:56:fd:58:cd
vlan: 10 parent interface: bge0
inet6 fe80::20d:56ff:fefd:58cd%vlan10 prefixlen 64 scopeid 0x7
inet 10.10.0.254 netmask 0xffff0000 broadcast 10.10.255.255
bge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
address: 00:0d:56:fd:58:ce
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet 10.1.0.254 netmask 0xffff0000 broadcast 10.1.255.255
inet6 fe80::20d:56ff:fefd:58ce%bge1 prefixlen 64 scopeid 0x2
The fact is the card has no problem to receive a gigabit stream as soos
as it isn't speaking at the same time...
That said, I really don't know what to do any more......
Has any of you already encountered and solved this problem before ? I'm
pretty sure I'm not the only one to have this.....
Thanks in advance !!
Frederic
--
__________________________________________________
Frederic BRET - Universite de La Rochelle
Centre de Ressources Informatiques
Technoforum - Avenue Einstein Tel : 0546458214
17042 La Rochelle Cedex - France Fax : 0546458245
__________________________________________________