On Fri, Mar 02, 2018 at 09:54:11AM -0500, David Miller wrote: > From: Pavel Machek <pa...@ucw.cz> > Date: Fri, 2 Mar 2018 10:20:00 +0100 >
Hello Pavel, David > >> This barrier cannot be a simple dma_wmb(), since a dma_wmb() is only > >> used to guarantee the ordering, with respect to other writes, > >> to cache coherent DMA memory. > > > > Could you explain this a bit more (and perhaps in code comment)? Have a look at: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/memory-barriers.txt?h=v4.16-rc1#n1913 AFAICT, a dma_wmb() can only be used to guarantee that the writes to cache coherent memory (e.g. memory allocated with dma_alloc_coherent()) before the dma_wmb() will be performed before the writes to cache coherent memory after the dma_wmb(). Since most of our writes are simply writing new buffer addresses and sizes to TDES0/TDES1/TDES2/TDES3, and since these TX DMA descriptors have been allocated with dma_alloc_coherent(), a dma_wmb() should be enough to e.g. make sure that TDES3 (which contains the OWN bit), is written after the writes to TDES0/TDES1/TDES2. However, the last write we do is "DMA start transmission", this is a register in the IP, i.e. it is a write to the cache incoherent MMIO region (rather than a write to cache coherent memory). To ensure that all writes to cache coherent memory have completed before we start the DMA, we have to use the barrier wmb() (which performs a more extensive flush compared to dma_wmb()). So the only place where we have to use a wmb() instead of a dma_wmb() is where we have a write to coherent memory, followed by a write to cache incoherent MMIO. The only obvious place where we have this situtation is where we write the OWN bit immediately followed by a write to the "DMA start transmission" register. Note that this also matches how it's done in other other drivers: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/amd/xgbe/xgbe-dev.c?h=v4.16-rc1#n1638 There is already a comment describing the barrier in stmmac_xmit() and stmmac_tso_xmit() that says: /* The own bit must be the latest setting done when prepare the * descriptor and then barrier is needed to make sure that * all is coherent before granting the DMA engine. */ However, if you want, we could mention wmb() explicitly in this comment. > > > > Ensuring other writes are done before writing the "GO!" bit should be > > enough, no? > > Indeed, the chip should never look at the descriptor contents unless > the GO bit is set. > > If there are ways that it can, this must be explained and documented > since it is quite unusual compared to other hardware. > > > (If it is not, do we need heavier barriers in other places, too?) > > Right. I hope that my explaination above has cleared any potential confusion. Best regards, Niklas