Hi Benjamin,

On Wed, Nov 27, 2024 at 04:06:34PM +0100, Salvatore Bonaccorso wrote:
> Hi
> 
> On Fri, Nov 22, 2024 at 01:46:17AM +0100, Benjamin Drung wrote:
> > On Thu, 2024-11-21 at 22:03 +0100, Salvatore Bonaccorso wrote:
> > > Control: tags -1 + moreinfo
> > > 
> > > Hi Benjamin,
> > > 
> > > On Wed, Nov 20, 2024 at 02:22:42AM +0100, Benjamin Drung wrote:
> > > > Package: linux
> > > > Version: 6.11.9-1
> > > > Severity: normal
> > > > X-Debbugs-Cc: bdr...@debian.org
> > > > 
> > > > Dear Maintainer,
> > > > 
> > > > Running the dracut test TEST-60-NFS on Debian unstable with
> > > > linux-image-6.11.9-amd64 fails with following kernel crash:
> > > > 
> > > > ```
> > > > [   15.600535] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state 
> > > > recovery directory
> > > > [   15.602863] NFSD: Using legacy client tracking operations.
> > > > [   15.603059] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state 
> > > > recovery directory
> > > > [   15.603569] ------------[ cut here ]------------
> > > > [   15.603706] kernel BUG at fs/nfsd/nfs4recover.c:534!
> > > > [   15.604360] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > [   15.604743] CPU: 0 UID: 0 PID: 471 Comm: rpc.nfsd Not tainted 
> > > > 6.11.9-amd64 #1  Debian 6.11.9-1
> > > > [   15.605019] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > > > 1.16.3-debian-1.16.3-2 04/01/2014
> > > > [   15.605337] RIP: 0010:nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd]
> > > > [   15.606083] Code: 19 48 89 de 48 c7 c7 10 90 9c c0 e8 6d fb ff ff 89 
> > > > c5 85 c0 0f 85 30 60 00 00 48 c7 c7 c0 af a3 c0 31 ed e8 25 b0 ca d2 eb 
> > > > 07 <0f> 0b bd f4 ff ff ff 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 75
> > > > [   15.606343] RSP: 0018:ff345c4e803fbb60 EFLAGS: 00010286
> > > > [   15.606343] RAX: 0000000000000049 RBX: ff2fd43447182000 RCX: 
> > > > 0000000000000003
> > > > [   15.606343] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 
> > > > 0000000000000001
> > > > [   15.606343] RBP: ffffffff9525dd40 R08: 0000000000000000 R09: 
> > > > ff345c4e803fb9f0
> > > > [   15.606343] R10: ffffffff946b41e8 R11: 0000000000000003 R12: 
> > > > ff2fd43447182000
> > > > [   15.606343] R13: ff2fd43447182000 R14: ff2fd43469336c00 R15: 
> > > > ff2fd43447182000
> > > > [   15.606343] FS:  00007fe05a5e9740(0000) GS:ff2fd4347ce00000(0000) 
> > > > knlGS:0000000000000000
> > > > [   15.606343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [   15.606343] CR2: 0000559addf39db0 CR3: 000000002836e000 CR4: 
> > > > 0000000000751ef0
> > > > [   15.606343] PKRU: 55555554
> > > > [   15.606343] Call Trace:
> > > > [   15.606343]  <TASK>
> > > > [   15.606343]  ? __die_body.cold+0x19/0x27
> > > > [   15.606343]  ? die+0x2e/0x50
> > > > [   15.606343]  ? do_trap+0xca/0x110
> > > > [   15.606343]  ? do_error_trap+0x6a/0x90
> > > > [   15.606343]  ? nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd]
> > > > [   15.606343]  ? exc_invalid_op+0x50/0x70
> > > > [   15.606343]  ? nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd]
> > > > [   15.606343]  ? asm_exc_invalid_op+0x1a/0x20
> > > > [   15.606343]  ? nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd]
> > > > [   15.606343]  nfsd4_client_tracking_init+0x57/0x1b0 [nfsd]
> > > > [   15.606343]  nfs4_state_start_net+0x2f9/0x3a0 [nfsd]
> > > > [   15.606343]  nfsd_svc+0x1b9/0x340 [nfsd]
> > > > [   15.606343]  write_threads+0xfc/0x1c0 [nfsd]
> > > > [   15.606343]  ? __pfx_write_threads+0x10/0x10 [nfsd]
> > > > [   15.606343]  nfsctl_transaction_write+0x4d/0x80 [nfsd]
> > > > [   15.606343]  vfs_write+0xfe/0x460
> > > > [   15.606343]  ksys_write+0x6d/0xf0
> > > > [   15.606343]  do_syscall_64+0x82/0x190
> > > > [   15.606343]  ? syscall_exit_to_user_mode+0x4d/0x210
> > > > [   15.606343]  ? do_syscall_64+0x8e/0x190
> > > > [   15.606343]  ? __x64_sys_getdents64+0xfa/0x130
> > > > [   15.606343]  ? __pfx_filldir64+0x10/0x10
> > > > [   15.606343]  ? syscall_exit_to_user_mode+0x4d/0x210
> > > > [   15.606343]  ? do_syscall_64+0x8e/0x190
> > > > [   15.606343]  ? __count_memcg_events+0x58/0xf0
> > > > [   15.606343]  ? count_memcg_events.constprop.0+0x1a/0x30
> > > > [   15.606343]  ? handle_mm_fault+0x1bb/0x2c0
> > > > [   15.606343]  ? do_user_addr_fault+0x36c/0x620
> > > > [   15.606343]  ? exc_page_fault+0x7e/0x180
> > > > [   15.606343]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > [   15.606343] RIP: 0033:0x7fe05a6f0210
> > > > [   15.606343] Code: 2c 0e 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 66 
> > > > 2e 0f 1f 84 00 00 00 00 00 80 3d 59 ae 0e 00 00 74 17 b8 01 00 00 00 0f 
> > > > 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
> > > > [   15.606343] RSP: 002b:00007fff649d2b08 EFLAGS: 00000202 ORIG_RAX: 
> > > > 0000000000000001
> > > > [   15.606343] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 
> > > > 00007fe05a6f0210
> > > > [   15.606343] RDX: 0000000000000002 RSI: 000056540dbbb340 RDI: 
> > > > 0000000000000003
> > > > [   15.606343] RBP: 000056540dbbb340 R08: 0000000000000064 R09: 
> > > > 00000000ffffffff
> > > > [   15.606343] R10: 0000000000000000 R11: 0000000000000202 R12: 
> > > > 0000000000020000
> > > > [   15.606343] R13: 000056540dbb7116 R14: 000056543353a2a0 R15: 
> > > > 0000000000000000
> > > > [   15.606343]  </TASK>
> > > > [   15.606343] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace 
> > > > ext4 crc16 mbcache jbd2 crc32c_generic sd_mod ahci libahci libata 
> > > > virtio_scsi scsi_mod crc32_pclmul crc32c_intel scsi_common virtio_net 
> > > > net_failover failover i6300esb watchdog sunrpc qemu_fw_cfg virtio_rng 
> > > > autofs4
> > > > [   15.618032] ---[ end trace 0000000000000000 ]---
> > > > [   15.618166] RIP: 0010:nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd]
> > > > [   15.618718] Code: 19 48 89 de 48 c7 c7 10 90 9c c0 e8 6d fb ff ff 89 
> > > > c5 85 c0 0f 85 30 60 00 00 48 c7 c7 c0 af a3 c0 31 ed e8 25 b0 ca d2 eb 
> > > > 07 <0f> 0b bd f4 ff ff ff 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 75
> > > > [   15.619086] RSP: 0018:ff345c4e803fbb60 EFLAGS: 00010286
> > > > [   15.619198] RAX: 0000000000000049 RBX: ff2fd43447182000 RCX: 
> > > > 0000000000000003
> > > > [   15.619336] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 
> > > > 0000000000000001
> > > > [   15.619472] RBP: ffffffff9525dd40 R08: 0000000000000000 R09: 
> > > > ff345c4e803fb9f0
> > > > [   15.619609] R10: ffffffff946b41e8 R11: 0000000000000003 R12: 
> > > > ff2fd43447182000
> > > > [   15.619746] R13: ff2fd43447182000 R14: ff2fd43469336c00 R15: 
> > > > ff2fd43447182000
> > > > [   15.619888] FS:  00007fe05a5e9740(0000) GS:ff2fd4347ce00000(0000) 
> > > > knlGS:0000000000000000
> > > > [   15.620045] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [   15.620158] CR2: 0000559addf39db0 CR3: 000000002836e000 CR4: 
> > > > 0000000000751ef0
> > > > [   15.620296] PKRU: 55555554
> > > > [   15.620469] Kernel panic - not syncing: Fatal exception
> > > > [   15.621342] Kernel Offset: 0x11a00000 from 0xffffffff81000000 
> > > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > ```
> > > > 
> > > > This crash is 100% reproducible and I can easily test different kernels.
> > > > The TEST-60-NFS works fine on Ubuntu oracular.
> > > > linux-image-6.12-rc6-amd64 6.12~rc6-1~exp1 from experimental is affected
> > > > as well.
> > > 
> > > Just to be clear, is this something you freshly hit with those version
> > > or was the problem present before? If you have a last good version,
> > > would you be able to bisect the changes to identify the culprit
> > > introducing the issue?
> > 
> > I hit this bug when I tried to introduce the nfs autopkgtest. I don't
> > know a good version in Debian. I pushed the this upstream-dracut-
> > network-nfs autopkgtest for dracut to the debian-nfs branch:
> > https://salsa.debian.org/debian/dracut/-/commits/debian-nfs?ref_type=heads
> > Test:
> > https://salsa.debian.org/debian/dracut/-/commit/a5b1da9ff33d412cc886408c3e6cafec265d6e29
> > So you should be able to reproduce it.
> > 
> > The same test case upstream-dracut-network-nfs works on Ubuntu with
> > linux 6.11.0-8.8:
> > https://autopkgtest.ubuntu.com/results/autopkgtest-plucky/plucky/amd64/d/dracut/20241121_232300_a5f72@/log.gz
> > 
> > > I have so far not found an already known regression report specific to
> > > this recently but there is a report back in august we found as 
> > > https://lore.kernel.org/all/23faefd973c63f9b0ec8a735acb1ff1409776163.ca...@linuxfoundation.org/
> > 
> > Yes, that looks similar.
> > 
> > > In any case since you can reliably reproduce the issue, can you please
> > > report it to upstream (linux-nfs list and relevant maintainers)?
> > 
> > I can do that.
> 
> So far I was not able to reproduce it, but this is because my
> autopkgtest already fails before we reach that point.
> 
> What would be ideal is if we can break-down the trigger to something
> which I can handle easier to forward to the linux-nfs list for further
> debugging.
> 
> I will continue investigating it.

So far no luck. If you have spare cycles to report it upstream
yourself and keep us in the loop this would be great.

Regards,
Salvatore

Reply via email to