On Tue, 2017-02-07 at 08:06 -0800, Eric Dumazet wrote:

>               /*
>                * make sure we read the CQE after we read the ownership bit
>                */
>               dma_rmb();
> +             prefetch(frags[0].page);

Note that I would like to instead do a prefetch(frags[1].page)

So I will probably change how ring->rx_info is allocated

wasting all that space and forcing vmalloc() is silly :

tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
                                sizeof(struct mlx4_en_rx_alloc));
ring->rx_info = vzalloc_node(tmp, node);

In most cases, using exactly 12 bytes per slot would allow better
packing. Only one cpu is using this area, no need to force strange
alignments, for the sake of avoiding a multiply !




Reply via email to