On Tue, 2015-04-28 at 19:11 -0700, Alexei Starovoitov wrote:
> Hi,
> 
> there were many requests for performance numbers in the past, but not
> everyone has access to 10/40G nics and we need a common way to talk
> about RX path performance without overhead of driver RX. That's
> especially important when making changes to netif_receive_skb.

Well, in real life, having to fetch RX descriptor and packet headers are
the main cost, and skb->users == 1.

So its nice trying to optimize netif_receive_skb(), but make sure you
have something that can really exercise same code flows/stalls,
otherwise you'll be tempted by wrong optimizations.

I would for example use a ring buffer, so that each skb you provide to
netif_receive_skb() has cold cache lines (at least skb->head if you want
to mimic build_skb() or napi_get_frags()/napi_reuse_skb() behavior)

Also, this model of flooding one cpu (no irqs, no context switch) mask
latencies caused by code size, since icache is fully populated, with a
very specialized working set.

If we want to pursue this model (like user space (DPDK and alike
frameworks)), we might have to design a very different model than the
IRQ driven one, by dedicating one or multiple cpu threads to run
networking code with no state transition.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to