Re: [PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash

Vladimir Oltean Thu, 04 Mar 2021 12:19:34 -0800

Hi Michael,

On Thu, Mar 04, 2021 at 08:52:52PM +0100, michael-...@fami-braun.de wrote:
> From: Michael Braun <michael-...@fami-braun.de>
>
> When using jumbo packets and overrunning rx queue with napi enabled,
> the following sequence is observed in gfar_add_rx_frag:
>
>    | lstatus                              |       | skb                   |
> t  | lstatus,  size, flags                | first | len, data_len, *ptr   |
> ---+--------------------------------------+-------+-----------------------+
> 13 | 18002348, 9032, INTERRUPT LAST       | 0     | 9600, 8000,  f554c12e |
> 12 | 10000640, 1600, INTERRUPT            | 0     | 8000, 6400,  f554c12e |
> 11 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  f554c12e |
> 10 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  f554c12e |
> 09 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  f554c12e |
> 08 | 14000640, 1600, INTERRUPT FIRST      | 0     | 1600, 0,     f554c12e |
> 07 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     f554c12e |
> 06 | 1c000080, 128,  INTERRUPT LAST FIRST | 1     | 0,    0,     abf3bd6e |
> 05 | 18002348, 9032, INTERRUPT LAST       | 0     | 8000, 6400,  c5a57780 |
> 04 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  c5a57780 |
> 03 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  c5a57780 |
> 02 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  c5a57780 |
> 01 | 10000640, 1600, INTERRUPT            | 0     | 1600, 0,     c5a57780 |
> 00 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     c5a57780 |
>
> So at t=7 a new packets is started but not finished, probably due to rx
> overrun - but rx overrun is not indicated in the flags. Instead a new
> packets starts at t=8. This results in skb->len to exceed size for the LAST
> fragment at t=13 and thus a negative fragment size added to the skb.
>
> This then crashes:
>
> kernel BUG at include/linux/skbuff.h:2277!
> Oops: Exception in kernel mode, sig: 5 [#1]
> ...
> NIP [c04689f4] skb_pull+0x2c/0x48
> LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
> Call Trace:
> [ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
> [ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
> [ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
> [ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
> [ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
> [ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
> [ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
> [ec4bff08] [c0062718] kthread+0x140/0x144
> [ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c
>
> This patch fixes this by checking for computed LAST fragment size, so a
> negative sized fragment is never added.
> In order to prevent the newer rx frame from getting corrupted, the FIRST
> flag is checked to discard the incomplete older frame.
>
> Signed-off-by: Michael Braun <michael-...@fami-braun.de>
> ---


Just for my understanding, do you have a reproducer for the issue?
I notice you haven't answered Claudiu's questions posted on v1.
On LS1021A I cannot trigger this apparent hardware issue even if I force
RX overruns (by reducing the ring size). Judging from the "NIP" register
from your stack trace, this is a PowerPC device, which one is it?

Re: [PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash

Reply via email to