Hi
just a few updates about our problem. I tried a different setup this
time : no iperf on the OpenBSD router, only ip forwarding :
I have 1 linux (kernel 2.6) as a load generator on each of 2 vlans
routed by the OpenBSD, no process on it except a top to monitor
interrupts and load level.
I modified iperf to generate very very small packets : I want this time
to determine the bottleneck on openBSD.
All I can do between the 2 linuxes and before an interrupt load of 100%
on the router is about 140Kpps (-> 5MB/s only) :
[ 4] 1.0- 2.0 sec 5.45 MBytes 45.7 Mbits/sec 0.009 ms 0/142862 (0%)
[ 4] 2.0- 3.0 sec 5.45 MBytes 45.7 Mbits/sec 0.010 ms 0/142860 (0%)
[ 4] 3.0- 4.0 sec 5.45 MBytes 45.7 Mbits/sec 0.009 ms 0/142841 (0%)
=> more packets/s from the sender would mean dropped packet on the
router and loss on the receiver.
And to demonstrate the maximum capabilities of the load generators, the
same mono-thread iperf test between the 2 linuxes on the same vlan
(without router) :
[ 5] 1.0- 2.0 sec 12.6 MBytes 105 Mbits/sec 0.003 ms 0/329213 (0%)
[ 5] 2.0- 3.0 sec 12.6 MBytes 106 Mbits/sec 0.003 ms 0/329980 (0%)
[ 5] 3.0- 4.0 sec 12.6 MBytes 106 Mbits/sec 0.003 ms 0/330488 (0%)
330kpps without loss and 1 busy CPU on the linux. But still more than
twice the packet rate through the router.
(And just for the fun, despite the announce of the capability of FreeBSD
to route 1Mpps, our 5.3 on bi-opteron is only able to route ~140Kpps too)
So the conclusion may be that the BSD hardwares are limited by the
ability of their OS to manage interrupts properly...
What do you think about this ?
Frederic
Frederic BRET wrote:
Hi all,
This is my first post to this list. I'm trying to understand why our
OpenBSD PF router is not able to cope correctly with needed gigabit
speeds....
I have two Dell 1750 single-Xeon 2.8GHz. The first is our production
router still under OpenBSD 3.4 beta with PF since 2 years, and the
second one is a fresh OpenBSD 3.7 under Generic stock kernel. The
ultimate goal beeing to build a CARP dual router with the 2 machines.
The problem is that none of the 2 machines is able to route at speed
higher than ~350mbit/s, even without PF which could slow things, what I
doubt of.
In order to validate the capacity of the server to cope with
simultaneous up/down gigabitstreams, I've done several tests
- First, validate the external test machine and the network.
Here is a simultaneous (-d) iperf TCP test between 2 Sun V40Z (SLES9
with Broadcom 5703). Between them, there's a HP Procurve 2824 Gigabit
switch with full-duplex enabled and properly negotiated on all ports :
ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <Linux iperf server
address> -d -w 256k
../..
[ 4] 0.0-10.0 sec 1.01 GBytes 864 Mbits/sec
[ 5] 0.0-10.0 sec 1.01 GBytes 865 Mbits/sec
=> The network AND the V40Z are capable of symetric quasi full-duplex
gigabit. OK
- This being said, I'll try to do the same thing between a V40Z and a
DELL 1750 (OpenBSD 3.7 with Broadcom 5704)
First lets do a non-simultaneous (-r) TCP test between the V40Z and a 1750
ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <OpenBSD iperf server
address> -r -w 256k
../..
[ 4] 0.0-10.0 sec 1.09 GBytes 935 Mbits/sec
[ 4] 0.0-10.0 sec 1.09 GBytes 938 Mbits/sec
=> More than 1GB are transfered in 10s in one way then in the other.
Unidirectionnal bandwidth of 1Gbits/s is almost respected, no problem.
- Now lets try simultaneously (-d) between the V40Z and the DELL 1750
like the first iperf test between the 2 linux boxes :
ROOT:Linux:/opt/iperf2/bin > ./iperf -i 1 -c <OpenBSD iperf server
address> -d -w 256k
../..
[ 4] 0.0-10.0 sec 403 MBytes 338 Mbits/sec
[ 5] 0.0-10.0 sec 1.02 GBytes 876 Mbits/sec
=> The Openbsd box isn't able to receive more the ~330Mbits/s every time
I tried when it's at the same time speaking through the wire. It's a
constant comportment.
- Seeing this disorder, let's try UDP transfers in order to determine
the speed at which the problem begins. I set the Linux client to send
and receive ` 46Mbits/s :
ROOT:Linux: > ./iperf -i 1 -w 256k -c <OpenBSD iperf server address> -b
46M -d -u
On the OpenBSD we can see this :
ROOT:OpenBSD: > ./iperf -i 1 -s -u -w 256k
../..
[ ID] Interval Transfer Bandwidth Jitter Lost/Total
Datagrams
[ 5] 0.0-10.0 sec 54.5 MBytes 45.7 Mbits/sec 0.264 ms 372/39217
(0.95%)
[ 7] 0.0-10.0 sec 55.0 MBytes 46.1 Mbits/sec 0.002 ms 0/39217 (0%)
=> We begin to lose inbound packets on the Openbsd as soon as 46MBits/s
while still sending outbound packets without problems.
Of course it's only a beginning, because that's what we have with a
stream of 800MBits/s :
ROOT:ob35bckp:/root/compile/iperf-1.7.0 > ./iperf -i 1 -s -u -w 256k
../..[ ID] Interval Transfer Bandwidth Jitter
Lost/Total Datagrams
[ 7] 0.0-10.0 sec 976 MBytes 819 Mbits/sec 0.013 ms 0/696200 (0%)
[ 5] 0.0-10.3 sec 79.4 MBytes 65.0 Mbits/sec 14.982 ms
657633/714260 (92%)
Now it's dramatic and the loss of packets is 92% !....
I guess it's a problem caused by unavailable buffers for the network
card. It's receiving network packets but has no place to put them...
There are a few elements to help thinking :
During the iperf test, here's what netstat is saying :
ROOT:OpenBSD: > netstat -m
1263 mbufs in use:
1142 mbufs allocated to data
3 mbufs allocated to packet headers
118 mbufs allocated to socket names and addresses
627/670/6144 mbuf clusters in use (current/peak/max)
1676 Kbytes allocated to network (93% in use)
And now, here is what it's saying while idle :
ROOT:OpenBSD: >netstat -m
1033 mbufs in use:
1027 mbufs allocated to data
3 mbufs allocated to packet headers
3 mbufs allocated to socket names and addresses
512/724/6144 mbuf clusters in use (current/peak/max)
1784 Kbytes allocated to network (71% in use)
The OpenBSD has plenty enough of memory to deal with thousands of PF
states since many years as you can see on the dmesg of the 3.7 one :
OpenBSD 3.7 (GENERIC) #50: Sun Mar 20 00:01:57 MST 2005
[EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Xeon(TM) CPU 2.80GHz ("GenuineIntel" 686-class) 2.79 GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID
real mem = 1073160192 (1048008K)
avail mem = 972750848 (949952K)
using 4278 buffers containing 53760000 bytes (52500K) of memory
Buffers for tcp and udp aren't stock any more:
ROOT:OpenBSD: > sysctl -a |grep space
net.inet.tcp.recvspace=131072
net.inet.tcp.sendspace=131072
net.inet.udp.recvspace=131072
net.inet.udp.sendspace=131072
NBMCLUSTERS has disappeared, but what I guess beeing its successor
(kern.maxclusters ?) is not able to do something good while increased.
According to Henning Brauer, I tried to change IFQ_MAXLEN in
sys/net/if.c. Unfortunately it didn't fix the problem. With the stable
and current kernel sources I tried many values for IFQ_MAXLEN up to 2000
without success.
I still have drops as low as 46Mbits/s with UDP and accordingly the drop
counter from netstat -s increases :
810148 datagrams received
710297 dropped due to full socket buffers
Just another test in order to exclude the comportment of the bge whether
we use jumbo or not : I have 2 interfaces on the Dell : bge0 (no IP but
MTU 8000 to permit 1500 bytes frames on the vlan interface) on which I
have 1 vlan interface and bge1 natively (untagged) configured on another
lan. Whatever the interface the drops are the same...
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 8000
address: 00:0d:56:fd:58:cd
media: Ethernet 1000baseT full-duplex
status: active
inet6 fe80::20d:56ff:fefd:58cd%bge0 prefixlen 64 scopeid 0x1
vlan10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
address: 00:0d:56:fd:58:cd
vlan: 10 parent interface: bge0
inet6 fe80::20d:56ff:fefd:58cd%vlan10 prefixlen 64 scopeid 0x7
inet 10.10.0.254 netmask 0xffff0000 broadcast 10.10.255.255
bge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
address: 00:0d:56:fd:58:ce
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet 10.1.0.254 netmask 0xffff0000 broadcast 10.1.255.255
inet6 fe80::20d:56ff:fefd:58ce%bge1 prefixlen 64 scopeid 0x2
The fact is the card has no problem to receive a gigabit stream as soos
as it isn't speaking at the same time...
That said, I really don't know what to do any more......
Has any of you already encountered and solved this problem before ? I'm
pretty sure I'm not the only one to have this.....
Thanks in advance !!
Frederic
--
__________________________________________________
Frederic BRET - Universite de La Rochelle
Centre de Ressources Informatiques
Technoforum - Avenue Einstein Tel : 0546458214
17042 La Rochelle Cedex - France Fax : 0546458245
__________________________________________________