Re: serialise if_start calls

Martin Pieuchot Wed, 02 Dec 2015 03:13:22 -0800

On 02/12/15(Wed) 20:38, David Gwynne wrote:
> 
> > On 30 Nov 2015, at 9:55 PM, David Gwynne <da...@gwynne.id.au> wrote:
> > 
> > this tweaks the guts of if_start so it guarantees that there's only
> > ever one call to ifp->if_start running in the system at a time.
> > previously this was implicit because it could only be called with
> > the KERNEL_LOCK held.
> > 
> > as we move forward it would be nice to run the queue without having
> > to take the biglock. however, because we also want to dequeue the
> > packets in order, it only makes sense to run a single instance of
> > the function in the whole system.
> > 
> > also, if a driver is recovering from an oactive situation (ie, it's
> > been able to free space on the tx ring) it should be able to start
> > tx again from an mpsafe interrupt context.
> > 
> > because most of our drivers assume that theyre run under the
> > KERNEL_LOCK, this diff uses a flag for the internals of the if_start
> > call to differentiate between them. it defaults for kernel locked,
> > but drivers can opt in to an mpsafe version that can call ifp->if_start
> > without the mplock held.
> > 
> > the kernel locked code takes KERNEL_LOCK and splnet before calling
> > ifp->if_start.
> > 
> > the mpsafe code uses the serialisation mechanism that the scsi
> > midlayer and pool runqueue use, but implemented with atomics instead
> > of operations under a mutex.
> > 
> > the semantic is that work will be queued onto a list protected by
> > a mutex (ie, the guts of struct ifqueue), and then a cpu will try
> > to enter a critical section that runs a function to service the
> > queued work. the cpu that enters the critical section has to dequeue
> > work in a loop, which is what all our drivers do.
> > 
> > if another cpu tries to enter the same critical section after
> > queueing more work, it will return immediately rather than spin on
> > the lock. the first cpu that is currently dequeueing work in the
> > critical section will be told to spin again to guarantee that it
> > will service the work the other cpu added.
> > 
> > so the network stack may be transmitting packets on cpu1, while an
> > interrupts on cpu0 occurs which frees up tx descriprots. if cpu0
> > calls if_start, it will return immediately because cpu1 will end
> > up doing the work it wanted to do anyway.
> > 
> > if the start routine can run on multiple cpus, then it becomes
> > necessary to know it is NOT running anymore when tearing a nic down.
> > to that end i have added an if_start_barrier function. an mpsafe
> > driver can call that when it's being brought down to guarantee that
> > another cpu isnt fiddling with the tx ring before freeing it.
> > 
> > a driver opts in to the mpsafe if_start call by doing the following:
> > 
> > 1. set ifp->if_xflags = IFXF_MPSAFE.
> > 2. calling if_start() instead of its own start routine (eg, myx_start).
> > 3. clearing IFF_RUNNING before calling if_start_barrier() on its way down.
> > 4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)
> > 
> > anyway, this is the diff i have come up with after playing with
> > several ideas. it removes the IFXF_TXREADY semantics, ie, tx
> > mitigation and reuses the flag bit for IFXF_MPSAFE.
> > 
> > the reason for that is juggling or deferring the start routine made
> > if_start_barrier annoyingly complicated, and all my attmepts at it
> > introduced a significant performance hit or were insanely complicated.
> > 
> > tx mitigation only ever gave me back 5 to 10% before it was badly
> > tweaked, and we've made a lot of other performance improvements
> > since then. while im sad to see it go, id rather move forward than
> > dwell on it.
> > 
> > in the future i would like to try delegating the work to mpsafe
> > taskqs, but in my attempts i lost something like 30% of my tx rate
> > by doing that. id like to investigate that further in the future,
> > just not right now.
> > 
> > finally, the last thing to consider is lock ordering problems.
> > because contention on the ifq_serializer causes the second context
> > to return imediately (that's true even if you call if_start from
> > within a critical section), i think all the problems are avoided.
> > i am more concerned with the ifq mutex than i am with the serialiser.
> > 
> > anyway, here's the diff to look at. happy to discuss further.
> > 
> > tests would be welcome too.
> 
> no tests? no opinions?


put it in.

Re: serialise if_start calls

Reply via email to