This series introduces a deferred enqueue API for the graph library that simplifies node development while maintaining performance.
The current node implementations use a manual speculation pattern where each node pre-allocates destination buffer slots, tracks which packets diverge from the speculated edge, and handles fixups at the end. This results in complex boilerplate code with multiple local variables (to_next, from, held, last_spec), memcpy calls, and stream get/put operations repeated across every node. The new rte_node_enqueue_deferred() API handles this automatically: - Tracks runs of consecutive packets going to the same edge - Flushes runs in bulk when the edge changes - Uses rte_node_next_stream_move() (pointer swap) when all packets go to the same destination - Preserves last_edge across invocations for cross-batch speculation The deferred state is stored in the node's fast-path cache line 1, alongside xstat_off, keeping frequently accessed data together. Performance was measured with l3fwd forwarding between two ports of an Intel E810-XXV 2x25G NIC (1 RX queue per port). Two graph worker threads ran on hyper threads of the same physical core on an Intel Xeon Silver 4316 CPU @ 2.30GHz. Results: - Baseline (manual speculation): 37.0 Mpps - Deferred API: 36.2 Mpps (-2.2%) The slight overhead comes from per-packet edge comparisons. However, this is offset by: - 826 fewer lines of code across 13 node implementations - Reduced instruction cache pressure from simpler code paths - Elimination of per-node speculation boilerplate - Easier development of new nodes Robin Jarry (3): graph: optimize rte_node_enqueue_next to batch by edge graph: add deferred enqueue API for batch processing node: use deferred enqueue API in process functions app/graph/ip4_output_hook.c | 35 +------- lib/graph/graph_populate.c | 1 + lib/graph/rte_graph_worker_common.h | 90 ++++++++++++++++++- lib/node/interface_tx_feature.c | 105 +++------------------- lib/node/ip4_local.c | 36 +------- lib/node/ip4_lookup.c | 37 +------- lib/node/ip4_lookup_fib.c | 36 +------- lib/node/ip4_lookup_neon.h | 100 ++------------------- lib/node/ip4_lookup_sse.h | 100 ++------------------- lib/node/ip4_rewrite.c | 120 +++---------------------- lib/node/ip6_lookup.c | 95 ++------------------ lib/node/ip6_lookup_fib.c | 34 +------- lib/node/ip6_rewrite.c | 118 +++---------------------- lib/node/pkt_cls.c | 130 +++------------------------- lib/node/udp4_input.c | 42 +-------- 15 files changed, 170 insertions(+), 909 deletions(-) -- 2.52.0

