On Tue, 2015-04-28 at 19:11 -0700, Alexei Starovoitov wrote: > Hi, > > there were many requests for performance numbers in the past, but not > everyone has access to 10/40G nics and we need a common way to talk > about RX path performance without overhead of driver RX. That's > especially important when making changes to netif_receive_skb.
Well, in real life, having to fetch RX descriptor and packet headers are the main cost, and skb->users == 1. So its nice trying to optimize netif_receive_skb(), but make sure you have something that can really exercise same code flows/stalls, otherwise you'll be tempted by wrong optimizations. I would for example use a ring buffer, so that each skb you provide to netif_receive_skb() has cold cache lines (at least skb->head if you want to mimic build_skb() or napi_get_frags()/napi_reuse_skb() behavior) Also, this model of flooding one cpu (no irqs, no context switch) mask latencies caused by code size, since icache is fully populated, with a very specialized working set. If we want to pursue this model (like user space (DPDK and alike frameworks)), we might have to design a very different model than the IRQ driven one, by dedicating one or multiple cpu threads to run networking code with no state transition. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html