On 09/02/2017 6:44 PM, Eric Dumazet wrote:
On Thu, Feb 9, 2017 at 8:41 AM, Tariq Toukan <ttoukan.li...@gmail.com> wrote:
Hi Eric,

Thanks again for your series.

On 09/02/2017 3:58 PM, Eric Dumazet wrote:

As mentioned half a year ago, we better switch mlx4 driver to order-0
allocations and page recycling.

This reduces vulnerability surface thanks to better skb->truesize
tracking and provides better performance in most cases.

v2 provides an ethtool -S new counter (rx_alloc_pages) and
code factorization, plus Tariq fix.

I see that you made significant changes to the previous series, especially
patch 14 (RX CQE processing).
Please notice that our work week has just finished here in Israel.
I will review the series, especially the new patches (10 to 14), on Sunday.

We need to test this series again in our functional and performance
regression systems.
It will be running during the weekend, so we can analyze the results and
update you on Sunday.

Previous performance results showed a degradation, especially in:
- TCP single stream at 64KB length.
What RX ring size are you using ? I have not seen this at all.
Default, out of box.

- TCP 16 streams at 1KB length.
TCP does not really care, it coalesces all these into TSO skbs, full size...
But the kernel stack has to split it back accordingly in the receive side, no?

This was probably because cache was too short, and many page allocations
were needed.
In CX4, we saw the same kind of degradation, much clearer and amplified as
it's 2.5 times faster (100G).

Regards,
Tariq Toukan

Reply via email to