Re: Performance tests with new network stack

2014-09-25 Thread Joel Sherrill
Thank you for posting this detailed analysis. This is all I was
asking for. I knew you had to have some hard data to want
that change.

I thought at one point, we had populated the new stack with
the checksum code from the old stack for the highly optimized
cases. Was the PowerPC one where we just trusted their new
code and our old code was more optimized?

Am I not remembering this correctly?

The other hot code spot was memcpy() but that should be the
same in both configurations since it is in newlib.

--joel
On 9/25/2014 1:15 AM, Sebastian Huber wrote:
> Hello,
>
> I used simple FTP transfers to/from the target to measure the TCP performance
> of the new network stack on a PowerPC MPC8309.  The new network stack is a 
> port
> from FreeBSD 9.2.  It is highly optimized for SMP and uses fine grained
> locking.  For uni-processor systems this is not a benefit.  About 2000 mutexes
> are present in the idle state of the stack.  It turned out that the standard
> RTEMS semaphores are a major performance bottleneck.  I added a light weight
> alternative (rtems_bsd_mutex).  For fine grained locking it is important that
> the uncontested mutex obtain/release is as fast as possible.
>
> With the latest version (struct timespec and rtems_bsd_mutex) I get this:
>
> curl -o /dev/null  ftp://anonymous@192.168.100.70/dev/zero
>% Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>   Dload  Upload   Total   SpentLeft  Speed
>0 00 1194M0 0  9101k  0 --:--:--  0:02:14 --:--:-- 
> 9158k
>
>perf disabled   coverage: 100.000%  runtime:  99.998%   covtime: 
> 100.000%
> name|ratio___|1%_2%5%_10%_20%_|
> in_cksumdata| 11.137%|==  
> |
> memcpy  | 10.430%|=   
> |
> tcp_output  |  7.189%|=   
> |
> ip_output   |  3.241%|=   
> |
> uma_zalloc_arg  |  2.710%|=== 
> |
> ether_output|  2.533%|==  
> |
> tcp_do_segment  |  2.121%|
> |
> m_copym |  2.062%|
> |
> uma_zfree_arg   |  2.062%|
> |
> bsd__mtx_unlock_flags   |  2.062%|
> |
> tcp_input   |  2.003%|=== 
> |
> Thread_Dispatch |  1.885%|=== 
> |
> rtalloc1_fib|  1.649%|=   
> |
> ip_input|  1.708%|==  
> |
> memmove |  1.532%|
> |
> rn_match|  1.473%|
> |
> tcp_addoptions  |  1.414%|
> |
> arpresolve  |  1.355%|=== 
> |
> in_cksum_skip   |  1.296%|=== 
> |
> memset  |  1.296%|=== 
> |
> mb_dupcl|  1.178%|==  
> |
> uec_if_dequeue  |  1.178%|==  
> |
> in_lltable_lookup   |  1.119%|=   
> |
> rtfree  |  1.001%|<   
> |
> ether_nh_input  |  1.001%|<   
> |
> uec_if_bd_wait_and_free |  1.001%|<   
> |
> quicc_bd_tx_submit_and_wait |  1.001%|<   
> |
> TOD_Get_with_nanoseconds|  1.001%|<   
> |
> uec_if_interface_start  |  0.942%|<   
> |
> bsd__mtx_lock_flags |  0.883%|<   
> |
> bzero   |  0.883%|<   
> |
> mb_ctor_mbuf|  0.824%|<   
> |
> mb_free_ext |  0.824%|<   
> |
> netisr_dispatch_src |  0.824%|<   
> |
> in_pcblookup_hash_locked.isr|  0.766%|<   
> |
> bsd_critical_enter  |  0.766%|<   
> |
> rw_runlock  |  0.707%|<   
> |
> if_transmit |  0.707%|<   
> |
> Timespec_A

Re: Performance tests with new network stack

2014-09-25 Thread Sebastian Huber

On 25/09/14 16:01, Joel Sherrill wrote:

I thought at one point, we had populated the new stack with
the checksum code from the old stack for the highly optimized
cases. Was the PowerPC one where we just trusted their new
code and our old code was more optimized?


FreeBSD no longer uses optimized routines since state of the art network 
interface controllers do this checksum stuff in hardware.


NetBSD has optimized variants, but the APIs differ.  They also have the 
advertising clause in their license.


I don't have time to write optimized assembler code myself at the moment.

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: Performance tests with new network stack

2014-09-25 Thread Joel Sherrill

On 9/25/2014 9:08 AM, Sebastian Huber wrote:
> On 25/09/14 16:01, Joel Sherrill wrote:
>> I thought at one point, we had populated the new stack with
>> the checksum code from the old stack for the highly optimized
>> cases. Was the PowerPC one where we just trusted their new
>> code and our old code was more optimized?
> FreeBSD no longer uses optimized routines since state of the art network 
> interface controllers do this checksum stuff in hardware.
>
> NetBSD has optimized variants, but the APIs differ.  They also have the 
> advertising clause in their license.
>
> I don't have time to write optimized assembler code myself at the moment.
>
OK. Then is this now a function of the NIC driver for those that don't
support
hardware checksum calculation?

Looking back at the history, I wrote the current PowerPC one a LONG time ago
based on the x86 one and neither has an advertising clause. They could be
copied and the API updated if someone cares.

-- 
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available(256) 722-9985

___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel