On Monday 22 June 2015 01:38 PM, Alexey Brodkin wrote:
> Hi all,
> 
> On Wed, 2015-06-17 at 07:03 +0000, Vineet Gupta wrote:
> +CC linux-arch, linux-mm, Arnd and Marek
> 
> On Tuesday 16 June 2015 11:11 PM, Alexey Brodkin wrote:
> 
> Current implementtion of descriptor init procedure only takes care about
> ownership flag. While it is perfectly possible to have underlying memory
> filled with garbage on boot or driver installation.
> 
> And randomly set flags in non-zeroed des0 and des1 fields may lead to
> unpredictable behavior of the GMAC DMA block.
> 
> Solution to this problem is as simple as explicit zeroing of both des0
> and des1 fields of all buffer descriptors.
> 
> Signed-off-by: Alexey Brodkin 
> <[email protected]><mailto:[email protected]>
> Cc: Giuseppe Cavallaro <[email protected]><mailto:[email protected]>
> Cc: [email protected]<mailto:[email protected]>
> Cc: [email protected]<mailto:[email protected]>
> Cc: [email protected]<mailto:[email protected]>
> 
> FWIW, this was causing sporadic/random networking flakiness on ARC SDP 
> platform (scheduled for upstream inclusion in next window)
> 
> This also leads to an interesting question - should 
> arch/*/dma_alloc_coherent() and friends unconditionally zero out memory (vs. 
> the current semantics of letting only doing it based on gfp, as requested by 
> driver). This is the second instance we ran into stale descriptor memory, the 
> first one was in dw_mmc driver which was recently fixed in upstream as well 
> (although debugged independently by Alexey and using the upstream fix)
> 
> http://www.spinics.net/lists/linux-mmc/msg31600.html
> 
> The pros is better out of box experience (despite buggy drivers) while the 
> cons are they remain broken and perhaps increased boot time due to extra 
> memzero....
> 
> Probably if we already have dma_zalloc_coherent() that does explicit zeroing 
> of returned memory then there's no need to do implicit zeroing in 
> dma_alloc_coherent()?


The question is, when drivers don't have dma_zalloc_coherent() - meaning they
don't pass __GFP_ZERO, which causes these random issues, do we need to be more
conservative in arch code (ARC at least is) or do we need to debug and fix these
drivers - one by one.

FWIW, ARC needs to fix __GFP_ZERO case, since we are doing memzero twice.

-Vineet
--
To unsubscribe from this list: send the line "unsubscribe netdev" in

Reply via email to