On Mon, Oct 9, 2017 at 2:07 AM, David Laight <david.lai...@aculab.com> wrote: > From: Jeff Kirsher >> Sent: 06 October 2017 18:57 >> From: Jacob Keller <jacob.e.kel...@intel.com> >> >> Since commit 6a7fded776a7 ("i40e: Fix RS bit update in Tx path and >> disable force WB workaround") we've tried to "optimize" setting the >> RS bit based around skb->xmit_more. This same logic was refactored >> in commit 1dc8b538795f ("i40e: Reorder logic for coalescing RS bits"), >> but ultimately was not functionally changed. > > I've tried to understand the above, but without definitions of WB > and RS and the '4-ness' of something it is quite difficult. > > I THINK this is all about telling the hardware there is a packet > in the ring to transmit? > > I don't understand the 4-ness. Linux requires that the hardware > be notified of a single packet transmit, and that the 'transmit > complete' also be notified in 'a timely manner' for a single packet.
So to clarify some of this. The RS is short for Report Status. It tells the hardware that when it has completed the descriptor it should trigger a write back, aka WB. The 4-ness is because each descriptor is 16 bytes, so if we write back once every 4 descriptors that is one 64B cache line which is much better for most platforms performance wise. > skb->xmit_more ought to be usable to optimise both of these > (assuming it isn't a lie). Actually it leads to issues if we hold off for too long. If we are busy dequeueing a long list, and the Tx queue has gone empty then we are stalling Tx without need to do so. We need to be regularly notifying the device that it has buffers available to transmit which is what xmit_more is good for. However in addition the hardware needs to notify us regularly that there are buffers ready to clean which it isn't necessarily so useful for, especially if are filling the entire ring and only providing one notification to the driver that there are buffers ready since we can defer the Tx interrupt for too long. What Jake is trying to fix here is to prevent us from generating what looks like a square wave in terms of throughput. Essentially the current behavior that is being seen is that all the buffers in the ring either belong to software, or belong to hardware. It is best for us to have this split half and half so that the hardware has buffers to transmit and software has buffers to clean and enqueue new buffers onto. > The driver would need to ensure a ring full of messages isn't > generated without either wakeup - but that might be 128 frames. That is what is happening here. Basically we are guaranteeing writebacks are occurring on a regular basis instead of in large bursts. > FWIW on the systems I have PCIe writes are relatively cheap > (reads are slow). So different counts would be appropriate > for delaying doorbell writes and requesting completion interrupts. > > David The doorbell writes are still being delayed the same amount with this patch. The only thing that was split out was the device write backs are now occurring on a more regular basis instead of being held off until a much larger set is handled. - Alex