On Thu, Feb 05, 2026 at 02:29:53PM +0200, Vladimir Oltean wrote: > On Wed, Feb 04, 2026 at 05:34:01PM -0800, Jakub Kicinski wrote: > > On Thu, 5 Feb 2026 02:59:01 +0200 Vladimir Oltean wrote: > > > Thanks! This is an extremely subtle corner case. I appreciate the patch > > > and explanation. > > > > > > I did run tests on the blamed commit (which I still have), but to catch > > > a real issue in a meaningful way it would have been required to have a > > > program which calls bpf_xdp_adjust_tail() with a very large offset. > > > I'm noting that I'm seeing the WARN_ON() much easier after your fix, but > > > before, it was mostly inconsequential for practical cases. > > > > > > Namely, the ENETC truesize is 2048, and XDP_PACKET_HEADROOM is 256. > > > First buffers also contain the skb_shared_info (320 bytes), while > > > subsequent buffers don't. > > > > I can't wrap my head around this series, hope you can tell me where I'm > > going wrong. AFAICT enetc splits the page into two halves for small MTU. > > > > So we have > > > > | 2k | 2k | > > ----------------------------- ----------------------------- > > | hroom | data | troom/shinfo | hroom | data | troom/shinfo | > > ----------------------------- ----------------------------- > > > > If we attach the second chunk as frag well have: > > offset = 2k + hroom > > size = data.len > > But we use > > truesize / frag_size = 2k > > so > > tailroom = rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag); > > tailroom = 2k - data.len - 2k > > tailroom = -data.len > > WARN(tailroom < 0) -> yes > > > > The frag_size thing is unusable for any driver that doesn't hand out > > full pages to frags? > > This is an excellent question. > > Yes, you're right, bpf_xdp_frags_increase_tail() only has a 50% chance > of working - the paged data has to be in the first half of the page, > otherwise the tailroom calculations are not correct due to rxq->frag_size, > and the WARN_ON() will trigger. > > The reason why I didn't notice this during my testing is stupid. I was > attaching the BPF program to the interface and then detaching it after > each test, and each test was sending less than the RX ring size (2048) > worth of packets. So all multi-buffer frames were using buffers which > were fresh out of enetc_setup_rxbdr() -> ... -> enetc_new_page() (first > halves) and never out of flipped pages (enetc_bulk_flip_buff()). > > This seems to be a good reason to convert this driver to use page pool, > which I can look into. I'm not sure that there's anything that can be > done to make the rxq->frag_size mechanism compatible with the current > buffer allocation scheme.
I was just about to send an answer. Seems like my mistake here. I actually think adjusting the tail should work, if we set rxq->frag_size to PAGE_SIZE in enetc and i40e_rx_pg_size() in i40e, and not to (PAGE_SIZE / 2), as I did at first, but in such case naming this frag_size is just utterly wrong. Glad Jakub has pointed this out. ice and idpf are fine, since they use libeth for Rx buffers, so mbuf packets always reside in 3K+ buffers. But for xsk_buff_pool seems like all drivers should have PAGE_SIZE as frag_size? I will let the discussion go on for at least a day and then will send v2 with patches reordered and those sizes corrected, maybe add ZC fixes on top.

