On 02/12/15(Wed) 20:38, David Gwynne wrote: > > > On 30 Nov 2015, at 9:55 PM, David Gwynne <da...@gwynne.id.au> wrote: > > > > this tweaks the guts of if_start so it guarantees that there's only > > ever one call to ifp->if_start running in the system at a time. > > previously this was implicit because it could only be called with > > the KERNEL_LOCK held. > > > > as we move forward it would be nice to run the queue without having > > to take the biglock. however, because we also want to dequeue the > > packets in order, it only makes sense to run a single instance of > > the function in the whole system. > > > > also, if a driver is recovering from an oactive situation (ie, it's > > been able to free space on the tx ring) it should be able to start > > tx again from an mpsafe interrupt context. > > > > because most of our drivers assume that theyre run under the > > KERNEL_LOCK, this diff uses a flag for the internals of the if_start > > call to differentiate between them. it defaults for kernel locked, > > but drivers can opt in to an mpsafe version that can call ifp->if_start > > without the mplock held. > > > > the kernel locked code takes KERNEL_LOCK and splnet before calling > > ifp->if_start. > > > > the mpsafe code uses the serialisation mechanism that the scsi > > midlayer and pool runqueue use, but implemented with atomics instead > > of operations under a mutex. > > > > the semantic is that work will be queued onto a list protected by > > a mutex (ie, the guts of struct ifqueue), and then a cpu will try > > to enter a critical section that runs a function to service the > > queued work. the cpu that enters the critical section has to dequeue > > work in a loop, which is what all our drivers do. > > > > if another cpu tries to enter the same critical section after > > queueing more work, it will return immediately rather than spin on > > the lock. the first cpu that is currently dequeueing work in the > > critical section will be told to spin again to guarantee that it > > will service the work the other cpu added. > > > > so the network stack may be transmitting packets on cpu1, while an > > interrupts on cpu0 occurs which frees up tx descriprots. if cpu0 > > calls if_start, it will return immediately because cpu1 will end > > up doing the work it wanted to do anyway. > > > > if the start routine can run on multiple cpus, then it becomes > > necessary to know it is NOT running anymore when tearing a nic down. > > to that end i have added an if_start_barrier function. an mpsafe > > driver can call that when it's being brought down to guarantee that > > another cpu isnt fiddling with the tx ring before freeing it. > > > > a driver opts in to the mpsafe if_start call by doing the following: > > > > 1. set ifp->if_xflags = IFXF_MPSAFE. > > 2. calling if_start() instead of its own start routine (eg, myx_start). > > 3. clearing IFF_RUNNING before calling if_start_barrier() on its way down. > > 4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback) > > > > anyway, this is the diff i have come up with after playing with > > several ideas. it removes the IFXF_TXREADY semantics, ie, tx > > mitigation and reuses the flag bit for IFXF_MPSAFE. > > > > the reason for that is juggling or deferring the start routine made > > if_start_barrier annoyingly complicated, and all my attmepts at it > > introduced a significant performance hit or were insanely complicated. > > > > tx mitigation only ever gave me back 5 to 10% before it was badly > > tweaked, and we've made a lot of other performance improvements > > since then. while im sad to see it go, id rather move forward than > > dwell on it. > > > > in the future i would like to try delegating the work to mpsafe > > taskqs, but in my attempts i lost something like 30% of my tx rate > > by doing that. id like to investigate that further in the future, > > just not right now. > > > > finally, the last thing to consider is lock ordering problems. > > because contention on the ifq_serializer causes the second context > > to return imediately (that's true even if you call if_start from > > within a critical section), i think all the problems are avoided. > > i am more concerned with the ifq mutex than i am with the serialiser. > > > > anyway, here's the diff to look at. happy to discuss further. > > > > tests would be welcome too. > > no tests? no opinions?
put it in.