On 2019/3/6 上午5:33, Willem de Bruijn wrote:
On Tue, Mar 5, 2019 at 4:03 PM Arthur Kepner <arthur.kep...@riverbed.com> wrote:
The attachment contains an UNTESTED patch (though a similar patch was tested
with a
3.10 kernel).
We've been chasing a bug where packet corruption is seen on a tap device. We
have a
PACKET_MMAP socket which is bound to a tap interface. When throughput goes
above a
threshold, we begin to see that packets received on the tap device are
truncated, or
otherwise corrupted. We found that when packets are enqueued to the tap device,
they
are fine, but by the time they are read, they can be corrupted.
And we found that simply deferring the call to skb_orphan() (where the
destructor,
tpacket_destruct_skb() marks the frame as TP_STATUS_AVAILABLE) fixes the
problem.
Maybe there's a better fix, but this worked for us. Thoughts? (Please CC me on
replies - I'm not
subscribed.)
The skb_orphan calls tpacket_destruct_skb, which updates the entry in
the packet ring to TP_STATUS_AVAILABLE.
Thanks for the report and suggested fix. Delaying the call to
skb_orphan reduces the race condition between release and read, but
does not fully remove it.
As of commit 5cd8d46ea156 ("packet: copy user buffers before orphan
or clone") in 4.20 this should no longer be an issue.
That reuses the msg_zerocopy infrastructure also for packet ring
packets with shared memory. And creates a private copy whenever these
may be looped to a local destination that may queue indefinitely, like
tun.
+1 and if possible the commit should be backported to 3.10. (Or try
vhost + TAP instead of packet mmap).
Thanks