Hello Willem, I was able to do scp without any problems with the patch. I agree with your reasoning of the patch.
Thanks for your attention to this problem. Thanks, Anand On Tue, Nov 20, 2018 at 3:22 AM Willem de Bruijn <willemdebruijn.ker...@gmail.com> wrote: > > On Mon, Nov 19, 2018 at 12:54 PM Willem de Bruijn > <willemdebruijn.ker...@gmail.com> wrote: > > > > On Mon, Nov 19, 2018 at 6:00 AM Anand H. Krishnan > > <anandhkrish...@gmail.com> wrote: > > > > > > Hello, > > > > > > I tried the 4.19.2 kernel without success. You were probably right > > > that skb_orphan > > > is indeed called from somewhere in the receive path. I had an > > > instrumented kernel > > > and the following is what I see: > > > > Thanks for verifying. > > > > I was also able to reproduce it without veth by looping a packet > > to another packet socket that delays reading from the socket. > > > > A very preliminary patch is to mark the socket as zerocopy in > > tpacket_fill_skb to trigger skb_copy_ubufs whenever the > > packet gets cloned or enters the receive path: > > > > @@ -2462,6 +2462,9 @@ static int tpacket_fill_skb(struct packet_sock > > *po, struct sk_buff *skb, > > skb->tstamp = sockc->transmit_time; > > sock_tx_timestamp(&po->sk, sockc->tsflags, > > &skb_shinfo(skb)->tx_flags); > > skb_shinfo(skb)->destructor_arg = ph.raw; > > + skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG; > > > > This requires at the least a fixup in skb_zcopy_clear to handle this > > special case and not call uarg->callback (because > > skb_shinfo(skb)->destructor_arg is of a different type for > > tpacket_snd). I'll share a patch to test when I have something > > more concrete. > > These packets already call an skb destructor when the original skb is > released, tpacket_destruct_skb, to release the slot and record the > timestamp of original submission. > > We cannot change that timestamp until all skbs are freed. And, we also > do not want two indirect function calls, one for tpacket on the original > skb and one for zcopy on the last skb clone. > > Instead, disallow cloning and refcounted ubufs, similar to the > existing tun/tap zerocopy driver. Call skb_copy_ubufs when > cloning, looping into the receive path, .. > > Mark these packets as being zerocopy pages (SKBTX_ZEROCOP_FRAG) > But, we cannot assign skb_shinfo(skb)->destructor_arg to hold a uarg, > because tpacket already uses that for tpacket_destruct_skb. We also do > not want or need the extra alloc/free. So introduce a zerocopy variant that > expects this field to not be dereferencable. > > I did not find a cleaner way to do this so far than with pointer > tricks. Tentative > patch that fixes the issue in my test: > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index a2e8297a5b00..c500e3c86a12 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -1334,6 +1334,22 @@ static inline void skb_zcopy_set(struct sk_buff > *skb, struct ubuf_info *uarg) > } > } > > +static inline void skb_zcopy_set_nouarg(struct sk_buff *skb, void *val) > +{ > + skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t) val | 0x1UL); > + skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG; > +} > + > +static inline bool skb_zcopy_is_nouarg(struct sk_buff *skb) > +{ > + return (uintptr_t) skb_shinfo(skb)->destructor_arg & 0x1UL; > +} > + > +static inline void * skb_zcopy_get_nouarg(struct sk_buff *skb) > +{ > + return (void *)((uintptr_t) skb_shinfo(skb)->destructor_arg & ~0x1UL); > +} > + > /* Release a reference on a zerocopy structure */ > static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy) > { > @@ -1343,7 +1359,7 @@ static inline void skb_zcopy_clear(struct > sk_buff *skb, bool zerocopy) > if (uarg->callback == sock_zerocopy_callback) { > uarg->zerocopy = uarg->zerocopy && zerocopy; > sock_zerocopy_put(uarg); > - } else { > + } else if (!skb_zcopy_is_nouarg(skb)) { > uarg->callback(uarg, zerocopy); > } > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index ec3095f13aae..a74650e98f42 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -2394,7 +2394,7 @@ static void tpacket_destruct_skb(struct sk_buff *skb) > void *ph; > __u32 ts; > > - ph = skb_shinfo(skb)->destructor_arg; > + ph = skb_zcopy_get_nouarg(skb); > packet_dec_pending(&po->tx_ring); > > ts = __packet_set_timestamp(po, ph, skb); > @@ -2461,7 +2461,7 @@ static int tpacket_fill_skb(struct packet_sock > *po, struct sk_buff *skb, > skb->mark = po->sk.sk_mark; > skb->tstamp = sockc->transmit_time; > sock_tx_timestamp(&po->sk, sockc->tsflags, > &skb_shinfo(skb)->tx_flags); > - skb_shinfo(skb)->destructor_arg = ph.raw; > + skb_zcopy_set_nouarg(skb, ph.raw); > > skb_reserve(skb, hlen); > skb_reset_network_header(skb);