On Wed, Oct 28, 2020 at 5:47 AM Joel Stanley <j...@jms.id.au> wrote: > > On Thu, 22 Oct 2020 at 07:41, Benjamin Herrenschmidt > <b...@kernel.crashing.org> wrote: > > > > > > > > + /* Ensure the descriptor config is visible before setting the tx > > > > + * pointer. > > > > + */ > > > > + smp_wmb(); > > > > + > > > > > > Shouldn't these be paired with smp_rmb() on the reader side? > > > > (Not near the code right now) I *think* the reader already has them > > where it matters but I might have overlooked something when I quickly > > checked the other day. > > Do we need a read barrier at the start of ftgmac100_tx_complete_packet? > > pointer = priv->tx_clean_pointer; > <--- here > txdes = &priv->txdes[pointer]; > > ctl_stat = le32_to_cpu(txdes->txdes0); > if (ctl_stat & FTGMAC100_TXDES0_TXDMA_OWN) > return false; > > This was the only spot I could see that might require one.
No, I don't think this is the one, since tx_clean_pointer is not updated in the other CPU, only this one. From what I can tell, you have a communication between three concurrent threads in the TX path: a) one CPU runs start_xmit. It reads tx_clean_pointer from the second CPU, writes the descriptors for the hardware, notifies the hardware and updates tx_pointer. b) the hardware gets kicked by start_xmit, it reads the descriptors, reads the data and updates the descriptors. it may send an interrupt, which is not important here. c) a second CPU runs the poll() function. It reads the tx_pointer, reads the descriptors, updates the descriptors and updates tx_clean_pointer. Things get a bit confusing because the tx_pointer and tx_clean_pointer variables are hidden behind macros and both sides repeatedly read and write them, with no serialization. This is what I would try to untangle it: - mark both tx_pointer and tx_clean_pointer as ____cacheline_aligned_in_smp to avoid the cacheline pingpong between the two CPUs - Use smp_load_acquire() to read a pointer that may have been updated by the other thread, and smp_store_release() to update the one the other side will read. - pull these accesses into the callers and only do them once if possible - reconsider the "keep poll() going as long as tx packets are queued" logic, and instead serialize against the packets that are actually completed by the hardware. Arnd