Thank you for your efforts! FWIW, yes, I saw the panic two more times when using the method you described.
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/2110885 Title: Kernel panic when unmounting ZFS snapshots Status in zfs-linux package in Ubuntu: Fix Released Status in zfs-linux source package in Noble: In Progress Status in zfs-linux source package in Oracular: In Progress Status in zfs-linux source package in Plucky: In Progress Status in zfs-linux source package in Questing: Fix Released Bug description: [Impact] ZFS mount/unmount operations can leave the storage pools stuck in 'D' state, preventing access to any datasets. [Test Plan] This is not easily reproducible, but seems to happen more frequently when repeatedly mounting and unmounting ZFS snapshots. Below is a simple test loop that eventually causes the kernel panic spews in a test system. 1. Set up a regular ZFS pool # zpool create pooltest sda sdb sdc 2. Create a ZFS filesystem on the new pool # zfs create pooltest/data 3. Write random data to the ZFS dataset. For convenience, we'll use the attached zfs_write_unified.py script # python3 zfs_write_unified.py . 4. Create a snapshot of pooltest/data # zfs snapshot pooltest/data@snapshot1 5. Mount/unmount this snapshot, while the zfs_write_unified is still running # while true; do sudo mount -t zfs pooltest/data@snapshot1 /var/tmp/snapshot1 && sleep 0.5 && sudo umount /var/tmp/snapshot1; done [Where problems could occur] This is a follow-up fix for an upstream zfs_prune patch. Potential regressions would likely show up on ZFS cleaning operations such as pool scrub, as well as on the unmount path. We should properly exercise the mount/unmount code paths, as well as snapshot creation and deletion. [Other Info] The fix was merged as part of the upstream v2.3.2 release, so it's not required for Questing. The buggy v1 patch has been backported to Noble, so only that release and newer are affected. The breaking commit is 38c0324c0fb6 Linux: Fix zfs_prune panics And the fix was introduced by a0e62718cfcf Linux: Fix zfs_prune panics v2 (#17121) -- Every now and then, the `umount` command gets stuck in the `D` state when unmounting ZFS snapshots: # ps aux | grep umount root 912290 0.0 0.0 10344 2560 ? D Apr26 0:01 umount /mnt/zfs-snapshot-backup/var/opt/jira At the same time, we can see a kernel oops/panic in `dmesg`: Sat 2025-04-26 02:15:43 UTC systemd[1]: mnt-zfs\x2dsnapshot\x2dbackup-var-opt-jira.mount: Deactivated successfully. Sat 2025-04-26 02:15:44 UTC kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Sat 2025-04-26 02:15:44 UTC kernel: #PF: supervisor instruction fetch in kernel mode Sat 2025-04-26 02:15:44 UTC kernel: #PF: error_code(0x0010) - not-present page Sat 2025-04-26 02:15:44 UTC kernel: PGD 8000000131251067 P4D 8000000131251067 PUD 0 Sat 2025-04-26 02:15:44 UTC kernel: Oops: 0010 [#1] PREEMPT SMP PTI Sat 2025-04-26 02:15:44 UTC kernel: CPU: 0 PID: 486 Comm: arc_prune Tainted: P O 6.8.0-58-generic #60-Ubuntu Sat 2025-04-26 02:15:44 UTC kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Sat 2025-04-26 02:15:44 UTC kernel: RIP: 0010:0x0 Sat 2025-04-26 02:15:44 UTC kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6. Sat 2025-04-26 02:15:44 UTC kernel: RSP: 0018:ffffb845c0cebd40 EFLAGS: 00010246 Sat 2025-04-26 02:15:44 UTC kernel: RAX: 0000000000000000 RBX: ffffb845c0cebdac RCX: 0000000000000000 Sat 2025-04-26 02:15:44 UTC kernel: RDX: 0000000000000000 RSI: ffffb845c0cebd48 RDI: ffff8f8bb1fa4f00 Sat 2025-04-26 02:15:44 UTC kernel: RBP: ffffb845c0cebd98 R08: 0000000000000000 R09: 0000000000000000 Sat 2025-04-26 02:15:44 UTC kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000009ca5 Sat 2025-04-26 02:15:44 UTC kernel: R13: 0000000000000000 R14: ffff8f8ab33dc000 R15: ffff8f8bb1fa4f00 Sat 2025-04-26 02:15:44 UTC kernel: FS: 0000000000000000(0000) GS:ffff8f8cede00000(0000) knlGS:0000000000000000 Sat 2025-04-26 02:15:44 UTC kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sat 2025-04-26 02:15:44 UTC kernel: CR2: ffffffffffffffd6 CR3: 000000013837e002 CR4: 00000000001706f0 Sat 2025-04-26 02:15:44 UTC kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sat 2025-04-26 02:15:44 UTC kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sat 2025-04-26 02:15:44 UTC kernel: Call Trace: Sat 2025-04-26 02:15:44 UTC kernel: <TASK> Sat 2025-04-26 02:15:44 UTC kernel: ? show_regs+0x6d/0x80 Sat 2025-04-26 02:15:44 UTC kernel: ? __die+0x24/0x80 Sat 2025-04-26 02:15:44 UTC kernel: ? page_fault_oops+0x99/0x1b0 Sat 2025-04-26 02:15:44 UTC kernel: ? do_user_addr_fault+0x2e9/0x670 Sat 2025-04-26 02:15:44 UTC kernel: ? free_large_kmalloc+0x6b/0xc0 Sat 2025-04-26 02:15:44 UTC kernel: ? exc_page_fault+0x83/0x1b0 Sat 2025-04-26 02:15:44 UTC kernel: ? asm_exc_page_fault+0x27/0x30 Sat 2025-04-26 02:15:44 UTC kernel: zfs_prune+0x90/0x130 [zfs] Sat 2025-04-26 02:15:44 UTC kernel: zpl_prune_sb+0x35/0x60 [zfs] Sat 2025-04-26 02:15:44 UTC kernel: arc_prune_task+0x22/0x40 [zfs] Sat 2025-04-26 02:15:44 UTC kernel: taskq_thread+0x1f6/0x3c0 [spl] Sat 2025-04-26 02:15:44 UTC kernel: ? __pfx_default_wake_function+0x10/0x10 Sat 2025-04-26 02:15:44 UTC kernel: ? __pfx_taskq_thread+0x10/0x10 [spl] Sat 2025-04-26 02:15:44 UTC kernel: kthread+0xf2/0x120 Sat 2025-04-26 02:15:44 UTC kernel: ? __pfx_kthread+0x10/0x10 Sat 2025-04-26 02:15:44 UTC kernel: ret_from_fork+0x47/0x70 Sat 2025-04-26 02:15:44 UTC kernel: ? __pfx_kthread+0x10/0x10 Sat 2025-04-26 02:15:44 UTC kernel: ret_from_fork_asm+0x1b/0x30 Sat 2025-04-26 02:15:44 UTC kernel: </TASK> Sat 2025-04-26 02:15:44 UTC kernel: Modules linked in: tls tcp_diag udp_diag inet_diag xt_comment xt_set ip_set_hash_net ip_set_hash_ip ip_set xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables cfg80211 binfmt_misc intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass rapl qxl drm_ttm_helper ttm i2c_piix4 zfs(PO) pvpanic_mmio pvpanic qemu_fw_cfg spl(O) input_leds joydev mac_hid serio_raw sch_fq_codel dm_multipath efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 crct10dif_pclmul crc32_pclmul hid_generic polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 floppy virtio_rng psmouse pata_acpi usbhid hid aesni_intel crypto_simd cryptd Sat 2025-04-26 02:15:44 UTC kernel: CR2: 0000000000000000 Sat 2025-04-26 02:15:44 UTC kernel: ---[ end trace 0000000000000000 ]--- Sat 2025-04-26 02:15:44 UTC kernel: RIP: 0010:0x0 Sat 2025-04-26 02:15:44 UTC kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6. Sat 2025-04-26 02:15:44 UTC kernel: RSP: 0018:ffffb845c0cebd40 EFLAGS: 00010246 Sat 2025-04-26 02:15:44 UTC kernel: RAX: 0000000000000000 RBX: ffffb845c0cebdac RCX: 0000000000000000 Sat 2025-04-26 02:15:44 UTC kernel: RDX: 0000000000000000 RSI: ffffb845c0cebd48 RDI: ffff8f8bb1fa4f00 Sat 2025-04-26 02:15:44 UTC kernel: RBP: ffffb845c0cebd98 R08: 0000000000000000 R09: 0000000000000000 Sat 2025-04-26 02:15:44 UTC kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000009ca5 Sat 2025-04-26 02:15:44 UTC kernel: R13: 0000000000000000 R14: ffff8f8ab33dc000 R15: ffff8f8bb1fa4f00 Sat 2025-04-26 02:15:44 UTC kernel: FS: 0000000000000000(0000) GS:ffff8f8cede00000(0000) knlGS:0000000000000000 Sat 2025-04-26 02:15:44 UTC kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sat 2025-04-26 02:15:44 UTC kernel: CR2: ffffffffffffffd6 CR3: 000000013837e002 CR4: 00000000001706f0 Sat 2025-04-26 02:15:44 UTC kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sat 2025-04-26 02:15:44 UTC kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sat 2025-04-26 02:15:44 UTC kernel: note: arc_prune[486] exited with irqs disabled Here is another stack track in the same situation on a different VM: [May10 04:58] general protection fault, probably for non-canonical address 0x636f6c2f7273752f: 0000 [#1] PREEMPT SMP NOPTI [ +0.000037] CPU: 3 PID: 676 Comm: arc_prune Tainted: P O 6.8.0-55-generic #57-Ubuntu [ +0.000022] Hardware name: Hetzner vServer/Standard PC (Q35 + ICH9, 2009), BIOS 20171111 11/11/2017 [ +0.000020] RIP: 0010:srso_alias_safe_ret+0x5/0x7 [ +0.000019] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8d 64 24 08 <c3> cc e8 f4 ff ff ff 0f 0b cc cc cc cc cc cc cc cc cc cc cc cc cc [ +0.000044] RSP: 0018:ffff9e32c043fd38 EFLAGS: 00010293 [ +0.000015] RAX: 636f6c2f7273752f RBX: ffff9e32c043fdac RCX: 0000000000000000 [ +0.000016] RDX: 0000000000000000 RSI: ffff9e32c043fd48 RDI: ffff8d5002f1db80 [ +0.000016] RBP: ffff9e32c043fd98 R08: 0000000000000000 R09: 0000000000000000 [ +0.000021] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000059b [ +0.000016] R13: 0000000000000000 R14: ffff8d5010ada000 R15: ffff8d5002f1db80 [ +0.000019] FS: 0000000000000000(0000) GS:ffff8d5730780000(0000) knlGS:0000000000000000 [ +0.000019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000013] CR2: 00007fdf8ea1b000 CR3: 0000000113a3c004 CR4: 0000000000770ef0 [ +0.000017] PKRU: 55555554 [ +0.000009] Call Trace: [ +0.000009] <TASK> [ +0.000010] ? show_regs+0x6d/0x80 [ +0.000014] ? die_addr+0x37/0xa0 [ +0.000011] ? exc_general_protection+0x1db/0x480 [ +0.000015] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000015] ? asm_exc_general_protection+0x27/0x30 [ +0.000017] ? srso_alias_safe_ret+0x5/0x7 [ +0.000012] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000014] ? zfs_prune+0xf7/0x130 [zfs] [ +0.000234] zpl_prune_sb+0x35/0x60 [zfs] [ +0.000202] arc_prune_task+0x22/0x40 [zfs] [ +0.000211] taskq_thread+0x1f6/0x3c0 [spl] [ +0.000026] ? __pfx_default_wake_function+0x10/0x10 [ +0.000019] ? __pfx_taskq_thread+0x10/0x10 [spl] [ +0.000023] kthread+0xf2/0x120 [ +0.000013] ? __pfx_kthread+0x10/0x10 [ +0.000014] ret_from_fork+0x47/0x70 [ +0.000013] ? __pfx_kthread+0x10/0x10 [ +0.000013] ret_from_fork_asm+0x1b/0x30 [ +0.000017] </TASK> [ +0.000009] Modules linked in: tls tcp_diag udp_diag inet_diag xt_comment xt_set ip_set_hash_net ip_set_hash_ip ip_set xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables binfmt_misc nls_iso8859_1 zfs(PO) spl(O) input_leds joydev serio_raw sch_fq_codel dm_multipath efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic usbhid hid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 ahci psmouse libahci virtio_gpu xhci_pci virtio_rng xhci_pci_renesas virtio_dma_buf aesni_intel crypto_simd cryptd [ +0.000208] ---[ end trace 0000000000000000 ]--- [ +0.758503] RIP: 0010:srso_alias_safe_ret+0x5/0x7 [ +0.000038] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8d 64 24 08 <c3> cc e8 f4 ff ff ff 0f 0b cc cc cc cc cc cc cc cc cc cc cc cc cc [ +0.000039] RSP: 0018:ffff9e32c043fd38 EFLAGS: 00010293 [ +0.000579] RAX: 636f6c2f7273752f RBX: ffff9e32c043fdac RCX: 0000000000000000 [ +0.000614] RDX: 0000000000000000 RSI: ffff9e32c043fd48 RDI: ffff8d5002f1db80 [ +0.000615] RBP: ffff9e32c043fd98 R08: 0000000000000000 R09: 0000000000000000 [ +0.000575] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000059b [ +0.000503] R13: 0000000000000000 R14: ffff8d5010ada000 R15: ffff8d5002f1db80 [ +0.000450] FS: 0000000000000000(0000) GS:ffff8d5730780000(0000) knlGS:0000000000000000 [ +0.000504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000481] CR2: 00007fdf8ea1b000 CR3: 000000010f716005 CR4: 0000000000770ef0 [ +0.000381] PKRU: 55555554 The end result is a system that cannot be shut down cleanly anymore, because unmounting never finishes. This is *not* easily reproducible. We run about 300 systems with Ubuntu 24.04, each one mounting and unmounting ZFS snapshots at least once per day. On those, we saw the bug 3 times in the last 2 months or so. Mounting/unmounting ZFS snapshots is part of our backup software. We've been doing that for many years now and this bug only started appearing with Ubuntu 24.04. Let me know if you need any more info. Thanks! More info: # lsb_release -rd No LSB modules are available. Description: Ubuntu 24.04.2 LTS Release: 24.04 # apt-cache policy zfsutils-linux zfsutils-linux: Installed: 2.2.2-0ubuntu9.2 Candidate: 2.2.2-0ubuntu9.2 Version table: *** 2.2.2-0ubuntu9.2 500 500 mirror+file:/etc/apt/mirrors/ubuntu.txt noble-updates/main amd64 Packages 100 /var/lib/dpkg/status # uname -a Linux foo 6.8.0-59-generic #61-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 11 23:16:11 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/2110885/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp