On 21 Aug 2019, at 16:53, Magnus Karlsson wrote:
On Wed, Aug 21, 2019 at 4:14 PM Magnus Karlsson
<magnus.karls...@gmail.com> wrote:
On Wed, Aug 21, 2019 at 3:46 PM Eelco Chaudron <echau...@redhat.com>
wrote:
On 21 Aug 2019, at 15:11, Magnus Karlsson wrote:
On Wed, Aug 14, 2019 at 3:51 PM Eelco Chaudron
<echau...@redhat.com>
wrote:
When an AF_XDP application received X packets, it does not mean X
frames can be stuffed into the producer ring. To make it easier
for
AF_XDP applications this API allows them to check how many frames
can
be added into the ring.
The patch below looks like a name change only, but the xsk_prod__
prefix denotes that this API is exposed to be used by
applications.
Besides, if you set the nb value to the size of the ring, you will
get the exact amount of slots available, at the cost of
performance
(you touch shared state for sure). nb is there to limit the
touching of the shared state.
Also the example xdpsock application has been modified to use this
new API, so it's also able to process flows at a 1pps rate on veth
interfaces.
1 pps! That is not that impressive ;-).
My apologies for the late reply and thank you for working on this.
So
what kind of performance difference do you see with your modified
xdpsock application on a regular NIC for txpush and l2fwd? If there
is
basically no difference or it is faster, we can go ahead and accept
this. But if the difference is large, we might consider to have two
versions of txpush and l2fwd as the regular NICs do not need this.
Or
we optimize your code so that it becomes as fast as the previous
version.
For both operation modes, I ran 5 test with and without the changes
applied using an iexgb connecting to a XENA tester. The throughput
numbers were within the standard deviation, so no noticeable
performance
gain or drop.
Sounds good, but let me take your patches for a run on something
faster, just to make sure we are CPU bound. Will get back.
I ran some experiments and with two cores (app on one, softirq on
another) there is no impact since the application core has cycles to
spare. But if you run it on a single core the drop is 1- 2% for l2fwd.
I think this is ok since your version is a better example and more
correct. Just note that your patch did not apply cleanly to bpf-next,
so please rebase it, resubmit and I will ack it.
Just sent out a v5 which is a tested rebase on the latest bpf-next.
<SNIP>