> On 1 Dec 2015, at 5:40 AM, Hrvoje Popovski <hrv...@srce.hr> wrote: > > On 30.11.2015. 12:55, David Gwynne wrote: >> this tweaks the guts of if_start so it guarantees that there's only >> ever one call to ifp->if_start running in the system at a time. >> previously this was implicit because it could only be called with >> the KERNEL_LOCK held. >> >> as we move forward it would be nice to run the queue without having >> to take the biglock. however, because we also want to dequeue the >> packets in order, it only makes sense to run a single instance of >> the function in the whole system. >> >> also, if a driver is recovering from an oactive situation (ie, it's >> been able to free space on the tx ring) it should be able to start >> tx again from an mpsafe interrupt context. >> >> because most of our drivers assume that theyre run under the >> KERNEL_LOCK, this diff uses a flag for the internals of the if_start >> call to differentiate between them. it defaults for kernel locked, >> but drivers can opt in to an mpsafe version that can call ifp->if_start >> without the mplock held. >> >> the kernel locked code takes KERNEL_LOCK and splnet before calling >> ifp->if_start. >> >> the mpsafe code uses the serialisation mechanism that the scsi >> midlayer and pool runqueue use, but implemented with atomics instead >> of operations under a mutex. >> >> the semantic is that work will be queued onto a list protected by >> a mutex (ie, the guts of struct ifqueue), and then a cpu will try >> to enter a critical section that runs a function to service the >> queued work. the cpu that enters the critical section has to dequeue >> work in a loop, which is what all our drivers do. >> >> if another cpu tries to enter the same critical section after >> queueing more work, it will return immediately rather than spin on >> the lock. the first cpu that is currently dequeueing work in the >> critical section will be told to spin again to guarantee that it >> will service the work the other cpu added. >> >> so the network stack may be transmitting packets on cpu1, while an >> interrupts on cpu0 occurs which frees up tx descriprots. if cpu0 >> calls if_start, it will return immediately because cpu1 will end >> up doing the work it wanted to do anyway. >> >> if the start routine can run on multiple cpus, then it becomes >> necessary to know it is NOT running anymore when tearing a nic down. >> to that end i have added an if_start_barrier function. an mpsafe >> driver can call that when it's being brought down to guarantee that >> another cpu isnt fiddling with the tx ring before freeing it. >> >> a driver opts in to the mpsafe if_start call by doing the following: >> >> 1. set ifp->if_xflags = IFXF_MPSAFE. >> 2. calling if_start() instead of its own start routine (eg, myx_start). >> 3. clearing IFF_RUNNING before calling if_start_barrier() on its way down. >> 4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback) >> >> anyway, this is the diff i have come up with after playing with >> several ideas. it removes the IFXF_TXREADY semantics, ie, tx >> mitigation and reuses the flag bit for IFXF_MPSAFE. >> >> the reason for that is juggling or deferring the start routine made >> if_start_barrier annoyingly complicated, and all my attmepts at it >> introduced a significant performance hit or were insanely complicated. >> >> tx mitigation only ever gave me back 5 to 10% before it was badly >> tweaked, and we've made a lot of other performance improvements >> since then. while im sad to see it go, id rather move forward than >> dwell on it. >> >> in the future i would like to try delegating the work to mpsafe >> taskqs, but in my attempts i lost something like 30% of my tx rate >> by doing that. id like to investigate that further in the future, >> just not right now. >> >> finally, the last thing to consider is lock ordering problems. >> because contention on the ifq_serializer causes the second context >> to return imediately (that's true even if you call if_start from >> within a critical section), i think all the problems are avoided. >> i am more concerned with the ifq mutex than i am with the serialiser. >> >> anyway, here's the diff to look at. happy to discuss further. >> >> tests would be welcome too. > > ...and i bought 10G-PCIE2-8B2L-2S although i'm 82599 fan and i will > never give up on IX :))))
but i dont want myx users. i can do pretty much what i like to that driver cos only two people use it. ix is "quite" popular, so needs more care to touch.