On Thu, Mar 01, 2012 at 12:05:11PM -0800, Vasu Dev wrote: > On Thu, 2012-03-01 at 08:39 -0500, Neil Horman wrote: > > On Wed, Feb 29, 2012 at 03:19:57PM -0800, Vasu Dev wrote: > > > On Mon, 2012-02-27 at 14:22 -0500, Neil Horman wrote: > > > > Since commit 853d3cbbe431571c3ae822c8f5df43acff344ded went in, we are > > > > guaranteed a clean division beteen fcoe code that runs in softirq > > > > context and > > > > code that runs in process context. This opens the door for us to > > > > implement some > > > > minor cleanups and optimizaions in each context. They're not large, > > > > but taken > > > > as a unit, appear to provide approximately a 0.4%-1% throughput > > > > increase. They > > > > are mostly spinlock cleanups (removing bh disables where no longer > > > > needed), but > > > > I've also included a change to fcoe_percpu_receive_thread that allows > > > > us to > > > > receive multiple fcoe frames without having to constantly drop and > > > > re-acquire > > > > the rx_list lock. > > > > > > All make sense. Just curious with throughput change details, can you > > > share more details such as what IO size and is it consistent across > > > multiple runs? > > > > > I just did some very rudimentary testing: > > time dd if=<fcoe block dev> of=/dev/null bs=512 count=1000000 > > I repeated that 100 times and averaged the system time (since this was all > > system code I was testing). Did that with and without this patch set and > > computed the speedup. > > > > I got curious to check numbers special after your list splice change not > requiring list lock for batch of skbs. It came out very close before and > after this series on ixgbe, similar to yours. In fact some variance is > common with same build across multiple runs least in my setup but your > longer repeats should have averaged out. > > The series should help more for NICs not routing in-gress to same cpu on > which IO originated as that would cause more contention on rx list lock. > Thats right, it will provide a small boost in the (hopefully) nominal case in which received traffic is processed on the cpu that we expect it on (theres fewer iteration of spin_lock_bh/spin_unlock_bh). The real benefit is in the worst case scenario however when multiple softirq contexts on other cpus are contending for a single lock on a third cpu. I think you can simulate this by manually messing with irq affinities, but I've not tried it yet. Neil
> Thanks Neil for additional details. > Vasu > > _______________________________________________ devel mailing list [email protected] https://lists.open-fcoe.org/mailman/listinfo/devel
