On Tue, 2018-06-19 at 17:25 -0700, Eric Dumazet wrote: > > On 06/19/2018 11:05 AM, Saeed Mahameed wrote: > > > this is only true for XDP setup, for non XDP max stride_size can > > only > > be around ~3k and only for mtu > ~6k > > > > For XDP setup you suggested: > > - priv->frag_info[0].frag_size = eff_mtu; > > + priv->frag_info[0].frag_size = PAGE_SIZE; > > > > currently the condition is: > > > > release = frags->page_offset + frag_info->frag_size > PAGE_SIZE; > > > > so my solution and yours have the same problem you described above. > > > > the problem is not with the initial values or with stride/farg size > > math, it just that in XDP we shouldn't reuse at ALL. I agree with > > you > > that we need to optimize and maybe for PAGE_SIZE > 8k we need to > > allow > > XDP setup to reuses. but for now there is a data corruption to > > handle. > > > Sure, we all agree there is a bug to fix. > > The way you are fixing it is kind of illogical. > > The NIC can use a frag if its _size_ is big enough to receive the > frame. > > The _stride_ is an abstraction created by the driver to report an > estimation of the _truesize_, > or memory consumption, so that linux can better track overall memory > usage. > > For example, if MTU=1500, the size of the fragment is 1536 bytes, but > since we can put only > 2 fragments per 4KB page (on x86), we declare the _stride_ to be 2048 > bytes. > > Declaring that a final blob of a page, being 1600 bytes, not able to > receive a frame because > _stride_ is 2048 is illogical and waste resources. > >
I see, I wanted to use _stride_ as grantee for how much a page frag can grow, for example in mlx5 we need the whole stride to build_skb around the frag, since we always need the trailer, but it is different in here and we can avoid resource waste. so how a bout this: (As suggested by Martin). currently as mlx4_en_complete_rx_desc assumes that priv->rx_headroom is always 0 in non-XDP setup, hence: frags->page_offset += sz_align; where it really should be: frags->page_offset += sz_align + priv->rx_headroom; we can use it as a hint to not reuse as below: what do you think ? diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 9f54ccbddea7..f14c7a574cc8 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -474,10 +474,10 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv, { const struct mlx4_en_frag_info *frag_info = priv->frag_info; unsigned int truesize = 0; + bool release = true; int nr, frag_size; struct page *page; dma_addr_t dma; - bool release; index 9f54ccbddea7..f14c7a574cc8 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c /* Collect used fragments while replacing them in the HW descriptors */ for (nr = 0;; frags++) { @@ -500,7 +500,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv, release = page_count(page) != 1 || page_is_pfmemalloc(page) || page_to_nid(page) != numa_mem_id(); - } else { + } elseif(!priv->rx_headroom) { u32 sz_align = ALIGN(frag_size, SMP_CACHE_BYTES); frags->page_offset += sz_align;