Re: Performance tests with new network stack
Thank you for posting this detailed analysis. This is all I was asking for. I knew you had to have some hard data to want that change. I thought at one point, we had populated the new stack with the checksum code from the old stack for the highly optimized cases. Was the PowerPC one where we just trusted their new code and our old code was more optimized? Am I not remembering this correctly? The other hot code spot was memcpy() but that should be the same in both configurations since it is in newlib. --joel On 9/25/2014 1:15 AM, Sebastian Huber wrote: > Hello, > > I used simple FTP transfers to/from the target to measure the TCP performance > of the new network stack on a PowerPC MPC8309. The new network stack is a > port > from FreeBSD 9.2. It is highly optimized for SMP and uses fine grained > locking. For uni-processor systems this is not a benefit. About 2000 mutexes > are present in the idle state of the stack. It turned out that the standard > RTEMS semaphores are a major performance bottleneck. I added a light weight > alternative (rtems_bsd_mutex). For fine grained locking it is important that > the uncontested mutex obtain/release is as fast as possible. > > With the latest version (struct timespec and rtems_bsd_mutex) I get this: > > curl -o /dev/null ftp://anonymous@192.168.100.70/dev/zero >% Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft Speed >0 00 1194M0 0 9101k 0 --:--:-- 0:02:14 --:--:-- > 9158k > >perf disabled coverage: 100.000% runtime: 99.998% covtime: > 100.000% > name|ratio___|1%_2%5%_10%_20%_| > in_cksumdata| 11.137%|== > | > memcpy | 10.430%|= > | > tcp_output | 7.189%|= > | > ip_output | 3.241%|= > | > uma_zalloc_arg | 2.710%|=== > | > ether_output| 2.533%|== > | > tcp_do_segment | 2.121%| > | > m_copym | 2.062%| > | > uma_zfree_arg | 2.062%| > | > bsd__mtx_unlock_flags | 2.062%| > | > tcp_input | 2.003%|=== > | > Thread_Dispatch | 1.885%|=== > | > rtalloc1_fib| 1.649%|= > | > ip_input| 1.708%|== > | > memmove | 1.532%| > | > rn_match| 1.473%| > | > tcp_addoptions | 1.414%| > | > arpresolve | 1.355%|=== > | > in_cksum_skip | 1.296%|=== > | > memset | 1.296%|=== > | > mb_dupcl| 1.178%|== > | > uec_if_dequeue | 1.178%|== > | > in_lltable_lookup | 1.119%|= > | > rtfree | 1.001%|< > | > ether_nh_input | 1.001%|< > | > uec_if_bd_wait_and_free | 1.001%|< > | > quicc_bd_tx_submit_and_wait | 1.001%|< > | > TOD_Get_with_nanoseconds| 1.001%|< > | > uec_if_interface_start | 0.942%|< > | > bsd__mtx_lock_flags | 0.883%|< > | > bzero | 0.883%|< > | > mb_ctor_mbuf| 0.824%|< > | > mb_free_ext | 0.824%|< > | > netisr_dispatch_src | 0.824%|< > | > in_pcblookup_hash_locked.isr| 0.766%|< > | > bsd_critical_enter | 0.766%|< > | > rw_runlock | 0.707%|< > | > if_transmit | 0.707%|< > | > Timespec_A
Re: Performance tests with new network stack
On 25/09/14 16:01, Joel Sherrill wrote: I thought at one point, we had populated the new stack with the checksum code from the old stack for the highly optimized cases. Was the PowerPC one where we just trusted their new code and our old code was more optimized? FreeBSD no longer uses optimized routines since state of the art network interface controllers do this checksum stuff in hardware. NetBSD has optimized variants, but the APIs differ. They also have the advertising clause in their license. I don't have time to write optimized assembler code myself at the moment. -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG. ___ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel
Re: Performance tests with new network stack
On 9/25/2014 9:08 AM, Sebastian Huber wrote: > On 25/09/14 16:01, Joel Sherrill wrote: >> I thought at one point, we had populated the new stack with >> the checksum code from the old stack for the highly optimized >> cases. Was the PowerPC one where we just trusted their new >> code and our old code was more optimized? > FreeBSD no longer uses optimized routines since state of the art network > interface controllers do this checksum stuff in hardware. > > NetBSD has optimized variants, but the APIs differ. They also have the > advertising clause in their license. > > I don't have time to write optimized assembler code myself at the moment. > OK. Then is this now a function of the NIC driver for those that don't support hardware checksum calculation? Looking back at the history, I wrote the current PowerPC one a LONG time ago based on the x86 one and neither has an advertising clause. They could be copied and the API updated if someone cares. -- Joel Sherrill, Ph.D. Director of Research & Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available(256) 722-9985 ___ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel