On Thu, Mar 01, 2012 at 12:05:11PM -0800, Vasu Dev wrote:
> On Thu, 2012-03-01 at 08:39 -0500, Neil Horman wrote:
> > On Wed, Feb 29, 2012 at 03:19:57PM -0800, Vasu Dev wrote:
> > > On Mon, 2012-02-27 at 14:22 -0500, Neil Horman wrote:
> > > > Since commit 853d3cbbe431571c3ae822c8f5df43acff344ded went in, we are
> > > > guaranteed a clean division beteen fcoe code that runs in softirq 
> > > > context and
> > > > code that runs in process context.  This opens the door for us to 
> > > > implement some
> > > > minor cleanups and optimizaions in each context.  They're not large, 
> > > > but taken
> > > > as a unit, appear to provide approximately a 0.4%-1% throughput 
> > > > increase.  They
> > > > are mostly spinlock cleanups (removing bh disables where no longer 
> > > > needed), but
> > > > I've also included a change to fcoe_percpu_receive_thread that allows 
> > > > us to
> > > > receive multiple fcoe frames without having to constantly drop and 
> > > > re-acquire
> > > > the rx_list lock.
> > > 
> > > All make sense. Just curious with throughput change details, can you
> > > share more details such as what IO size and is it consistent across
> > > multiple runs?  
> > > 
> > I just did some very rudimentary testing:
> > time dd if=<fcoe block dev> of=/dev/null bs=512 count=1000000
> > I repeated that 100 times and averaged the system time (since this was all
> > system code I was testing).  Did that with and without this patch set and
> > computed the speedup.
> > 
> 
> I got curious to check numbers special after your list splice change not
> requiring list lock for batch of skbs. It came out very close before and
> after this series on ixgbe, similar to yours. In fact some variance is
> common with same build across multiple runs least in my setup but your
> longer repeats should have averaged out. 
> 
> The series should help more for NICs not routing in-gress to same cpu on
> which IO originated as that would cause more contention on rx list lock.
> 
Thats right, it will provide a small boost in the (hopefully) nominal case in
which received traffic is processed on the cpu that we expect it on (theres
fewer iteration of spin_lock_bh/spin_unlock_bh).  The real benefit is in the
worst case scenario however when multiple softirq contexts on other cpus are
contending for a single lock on a third cpu.  I think you can simulate this by
manually messing with irq affinities, but I've not tried it yet.
Neil

> Thanks Neil for additional details. 
> Vasu
> 
> 
_______________________________________________
devel mailing list
[email protected]
https://lists.open-fcoe.org/mailman/listinfo/devel

Reply via email to