Hi Michael, On Thu, Mar 04, 2021 at 08:52:52PM +0100, michael-...@fami-braun.de wrote: > From: Michael Braun <michael-...@fami-braun.de> > > When using jumbo packets and overrunning rx queue with napi enabled, > the following sequence is observed in gfar_add_rx_frag: > > | lstatus | | skb | > t | lstatus, size, flags | first | len, data_len, *ptr | > ---+--------------------------------------+-------+-----------------------+ > 13 | 18002348, 9032, INTERRUPT LAST | 0 | 9600, 8000, f554c12e | > 12 | 10000640, 1600, INTERRUPT | 0 | 8000, 6400, f554c12e | > 11 | 10000640, 1600, INTERRUPT | 0 | 6400, 4800, f554c12e | > 10 | 10000640, 1600, INTERRUPT | 0 | 4800, 3200, f554c12e | > 09 | 10000640, 1600, INTERRUPT | 0 | 3200, 1600, f554c12e | > 08 | 14000640, 1600, INTERRUPT FIRST | 0 | 1600, 0, f554c12e | > 07 | 14000640, 1600, INTERRUPT FIRST | 1 | 0, 0, f554c12e | > 06 | 1c000080, 128, INTERRUPT LAST FIRST | 1 | 0, 0, abf3bd6e | > 05 | 18002348, 9032, INTERRUPT LAST | 0 | 8000, 6400, c5a57780 | > 04 | 10000640, 1600, INTERRUPT | 0 | 6400, 4800, c5a57780 | > 03 | 10000640, 1600, INTERRUPT | 0 | 4800, 3200, c5a57780 | > 02 | 10000640, 1600, INTERRUPT | 0 | 3200, 1600, c5a57780 | > 01 | 10000640, 1600, INTERRUPT | 0 | 1600, 0, c5a57780 | > 00 | 14000640, 1600, INTERRUPT FIRST | 1 | 0, 0, c5a57780 | > > So at t=7 a new packets is started but not finished, probably due to rx > overrun - but rx overrun is not indicated in the flags. Instead a new > packets starts at t=8. This results in skb->len to exceed size for the LAST > fragment at t=13 and thus a negative fragment size added to the skb. > > This then crashes: > > kernel BUG at include/linux/skbuff.h:2277! > Oops: Exception in kernel mode, sig: 5 [#1] > ... > NIP [c04689f4] skb_pull+0x2c/0x48 > LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844 > Call Trace: > [ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable) > [ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4 > [ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c > [ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0 > [ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc > [ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70 > [ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc > [ec4bff08] [c0062718] kthread+0x140/0x144 > [ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c > > This patch fixes this by checking for computed LAST fragment size, so a > negative sized fragment is never added. > In order to prevent the newer rx frame from getting corrupted, the FIRST > flag is checked to discard the incomplete older frame. > > Signed-off-by: Michael Braun <michael-...@fami-braun.de> > ---
Just for my understanding, do you have a reproducer for the issue? I notice you haven't answered Claudiu's questions posted on v1. On LS1021A I cannot trigger this apparent hardware issue even if I force RX overruns (by reducing the ring size). Judging from the "NIP" register from your stack trace, this is a PowerPC device, which one is it?