On 25/4/19 7:37 am, Jonathan Brandmeyer wrote: > Any good tips & tricks I should know about how to optimize the > rtems-libbsd networking stack?
I use the stack defaults with an /etc/rc.conf of: TELn [/] # cat /etc/rc.conf # # Hydra LibBSD Configuration # hostname="XXX-880452-0014" ifconfig_cgem0="DHCP rxcsum txcsum" ifconfig_cgem0_alias0="ether 20:c3:05:11:00:25" dhcpcd_priority="200" dhcpcd_options="--nobackground --timeout 10" telnetd_enable="YES TELn [/] # ifconfig cgem0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=68008b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> ether 20:c3:05:11:00:25 hwaddr 0e:b0:ba:5e:ba:11 inet6 fe80::72b3:d5ff:fec1:6029%cgem0 prefixlen 64 scopeid 0x1 inet 10.10.5.189 netmask 0xffffff00 broadcast 10.10.5.255 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active Using a recent kernel, libbsd and a custom protobufs protocol some simple testing I did showed a sustained TX of 800Mbsp with higher peaks on TCP and UDP. I am not sure what the socket buffer sizes are set too. Note, the interface set up enables hardware checksums for the tx and rx paths. > Case: > - Cortex-A9, dual-core, SMP mode, using the zynq BSP on microzed hardware. > - RTEMS v5, using the libbsd networking layer. > - Network is otherwise idle > - Test case is a trivial program that just read()'s from a socket in a > loop into a 10 kB buffer, while using netcat or iperf on the sender to > stream data through the pipe. Nothing is done with the data after it > is read, we just read another buffer. > > The throughput isn't great. I'm seeing ~390 Mbps with default > settings. When testing with iperf as the client, I see that one IRQS > server is credited with almost exactly the same amount of runtime as > the test duration, and that the SHEL task (implementing the server > side of the socket) is credited with about 40% of that time as well. > > Without a detailed CPU profiler, its hard to know exactly where the > time is being spent in the networking stack, but it clearly is > CPU-limited. Enabling hardware checksum offload improved throughput > from ~390 Mbps to ~510 Mbps. Our dataflow is such that jumbo frames > would be an option, but the cadence device doesn't support an MTU > larger than 1500 bytes. Disabling the fancy networking features used > by the libbsd test programs had no effect. I use the shell command `top` to look at the CPU load. With a single core I had capacity left, ie IDLE was not 0%. I think I was limited by data feed from the PL. > Ethernet is not used in our field configuration, but in our testing > configuration we were aiming for about 500 Mbps throughput with about > 1.5 cores left for additional processing. Are there any other tunable > knobs that can get some more throughput? XAPP1082 suggests that > inbound throughput in the 750+ range is achievable... on a completely > different OS and network stack. > > Speaking of tunables, I do see via `sysctl` that > `dev.cgem.0.stats.rx_resource_errs` and `dev.cgem.0._rxnobufs` are > nonzero after a benchmark run. But if the test is CPU limited, then I > wouldn't expect throwing buffers at the problem to help. I would attempt to separate the networking performance testing and your apps ability to consume the data. This may help isolate the performance issue. Chris _______________________________________________ users mailing list users@rtems.org http://lists.rtems.org/mailman/listinfo/users