Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling

Eric Dumazet Tue, 07 Feb 2017 11:19:08 -0800

On Tue, Feb 7, 2017 at 11:05 AM, Alexei Starovoitov
<alexei.starovoi...@gmail.com> wrote:
> On Tue, Feb 07, 2017 at 08:26:23AM -0800, Eric Dumazet wrote:
>> On Tue, 2017-02-07 at 08:06 -0800, Eric Dumazet wrote:
>>
>
> Awesome that you've started working on this. I think it's correct approach
> and mlx5 should be cleaned up in similar way.
> Long term we should be able to move all page alloc/free out of the drivers
> completely.
>
>> >             /*
>> >              * make sure we read the CQE after we read the ownership bit
>> >              */
>> >             dma_rmb();
>> > +           prefetch(frags[0].page);
>>
>> Note that I would like to instead do a prefetch(frags[1].page)
>
> yeah, this two look weird:
> +               prefetch(frags[0].page);
> +               va = page_address(frags[0].page) + frags[0].page_offset;
>
> on most archs page_addres() is just math (not a load from memory),
> but the result != frags[0].page, so I'm missing what are you trying to 
> prefetch?
>
> prefetch(frags[1].page)
> is even more confusing. what will it prefetch?



The "struct page"  of the following frame

Remember we need :

                 release = page_count(page) != 1 ||
                                page_is_pfmemalloc(page) ||
                                page_to_nid(page) != numa_mem_id();

Then :

page_ref_inc(page);


My patch now does :

prefetch(frags[priv->num_frags].page);

>
> btw we had a patch that was doing prefetch of 'va' of next packet
> and it was very helpful. Like this:

I preferred to fetch the second cache line of this frame,
because TCP is mostly used with timestamps : total of 66 bytes of
header with IPv4, and more for IPV6 of course.


>    pref_index = (index + 1) & ring->size_mask;
>    pref = ring->rx_info + (pref_index << priv->log_rx_info);
>    prefetch(page_address(pref->page) + pref->page_offset);
>
> but since you're redesigning rxing->rx_info... not sure how will it fit.
>
>> So I will probably change how ring->rx_info is allocated
>>
>> wasting all that space and forcing vmalloc() is silly :
>>
>> tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
>>                                 sizeof(struct mlx4_en_rx_alloc));
>
> I think you'd still need roundup_pow_of_two otherwise priv->log_rx_info
> optimization won't work.

No more log_rx_info trick.

Simply : frags = priv->rx_info + (index * priv->rx_info_bytes_per_slot);

A multiply is damn fast these days compared to cache misses.

Using 24*<rx_ring_size> bytes is better than 32*<rx_ring_size>, our
L1/L2 caches are quite small.

Of course, this applies to the 'stress' mode, not the light mode where
we receive one single packet per IRQ.

Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling

Reply via email to