Panic in sfc module on boot since 5.10

Trevor Hemsley Thu, 15 Apr 2021 02:09:40 -0700

Hi,

I run Fedora 32 and since kernels in the 5.10 series I have been unableto boot without getting a panic in the sfc module. I tried on 5.11.12tonight and the crash still occurs. I have tried reporting this viaFedora channels but the silence has been deafening and I suspect this isan upstream issue anyway.

I'm running an Asus X570-Pro with a 3700x processor, 64GB ECC RAM andvarious nvme/SATA disks. I have a dual port Solarflare SFN6122F PCIEcard installed that shows up in lspci as:

0b:00.0 Ethernet controller [0200]: Solarflare Communications SFC902010G Ethernet Controller [1924:0803]0b:00.1 Ethernet controller [0200]: Solarflare Communications SFC902010G Ethernet Controller [1924:0803]

I have attached jpegs of the crash on the Fedora bugzilla entryhttps://bugzilla.redhat.com/show_bug.cgi?id=1924982 but since I figuremany here won't want to download a 2.5MB attachment from a slow bugzillaI'll try to transcribe the relevant bits here:


BUG: kernel NULL pointer dereference, address: 0000000000000104
#PF: supervisor write acess in kernel mode
#PF: error_code(0x8002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
CPU: 0 PID: 1067 Comm: rngd Not tainted 5.11.12-100.fc32.x86_64 #1

Hardware name: System manufacturer System Product Name/PRIME X570-PRO,BIOS 3405 02/01/2021

RIP: 0010:efx_farch_ev_process+0x3d2/0x910 [sfc]

Code: c0 02 39 f0 76 34 c1 fe 02 41 03 b6 28 07 00 00 83 e1 03 49 8b 84f6 d0 00 00 00 48 8b 94 c8 80 09 00 00 b0 01 00 00 00 31 c9 <f0> 8f b18a 04 81 00 00 05 c0 0f 05 37 03 00 00 48 8d 74 24 20 4c

RSP: 0000:ffff9e04c0003e78 EFLAGS: 000010246
RAX: 0000000000000001 RBX: ffff89548a9b5000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8954832c0940
RBP: 000000000000001e R08: ffff9e04c0003f50 R09: ffff89636ea2b140
R10: 0000000000000000 R11: ffffffffb9a060c0 R12: 00000000000000
R13: ffff8954832c0940 R14: ffff8954832c0940 R15: ffff89548a9b5480
FS: 00007ff835b31700(0000) GS:ffff89636ea00000(0000) knlGS:0000000000000000
CS: 0010 DA: 0000 ES: 0000 CR8: 0000000000050833
CR2: 0000000000000104 CR3: 000000011c41a000 CR4: 0000000000350ef0
Call Trace:
<IRQ>
? trigger_load_balance+0x5a/0x220
efx_poll_0xcb/0x380 [sfc]
net_rx_action+0x136/0x400
__do_softirq+0xcf/0x20f
asm_call_irq_on_stack+0x12/0x20
</IRQ>
do_softirq_own_stack_0x37/0x40
__irq_exit_rcu+0xbf/0x100
common_interrupt+0x74/0x130
? asm_common_interrupt+0x8/0x40
asm_common_interrupt+0x1e/0x40
RIP: 0033:0x7ff836732b00

I won't guarantee there are no typos in that lot since the picture is abit fuzzy and so are my eyes after all that. You can find the originalon the referenced bz above.

No problems on 5.9.16 which is the last pre-5.10 kernel available forF32. Everything I've tried since 5.10 goes pop.

In case it helps, this is what sfcboot reports for one of the cards (theother is the same)


enp11s0f0np0:
  Boot image                            Option ROM only
    Link speed                          Negotiated automatically
    Link-up delay time                  5 seconds
    Banner delay time                   2 seconds
    Boot skip delay time                5 seconds
    Boot type                           Disabled
  PF MSI-X interrupt limit              512
  SR-IOV                                Disabled
  Virtual Functions on each PF          127
  VF MSI-X interrupt limit              1

and sfupdate:

enp11s0f0np0 - MAC: 00-0x-xx-0x-xx-xx (intentionally obscured)
    Firmware version:   v7.6.9
    Controller type:    Solarflare SFC9000 family
    Controller version: v3.3.2.1000
    Boot ROM version:   v5.2.2.1004

Just prior to the crash I get a pair of messages that don't lookparticularly right but I get these on 5.9.16 too and that survives.

[ 9.027961] sfc 0000:0b:00.0 enp11s0f0np0: MC command 0x2a inlen 16failed rc=-22 (raw=0) arg=0[ 9.029895] sfc 0000:0b:00.1 enp11s0f1np1: MC command 0x2a inlen 16failed rc=-22 (raw=0) arg=0

I'm not subscribed to the list so I'd be grateful for a cc on anyreplies or if I'm on entirely the wrong mailing list, feel free to letme know that too! I can supply any more information that would be usefulto get this fixed.


Thanks,

Trevor

Panic in sfc module on boot since 5.10

Reply via email to