** Description changed: ================ Environment details: ================ - Linux Kernel Version: 5.15.0-117 - Ubuntu 22.04 - nfs-utils package: 1:1.3.4-2.5ubuntu3.3 and 1:2.6.1-1ubuntu1.2 - Using NFSv3 + - Using 256 nfsd threads on the nfs server. + - We are executing approx 30 exportfs - install and uninstall every min. ================ Issue observed: ================ - I see `refcount_t underflow` and `use-after-free` warnings associated specifically with `nfsd_write` operations in the syslogs. - After this occurs, the system eventually becomes non-functional, requiring a manual reboot. - The last NFS-related operation seems to be an `unexport` of a filesystem: exportfs -u 10.20.30.40:/fs/4d148fdf-6b18-48c8-8215-7c5dd58cac7d/c0 - After the problem occurs, syslogs flood continuously with these repetitive and only entries: VFS: Close: file count is 0 - On the system console, I consistently notice one `exportfs` process stuck in a CPU soft lockup—this persists until the system gets rebooted. ================ Stack trace ================ 2025-05-15T10:22:24.988837+00:00 <4> kernel - [7144513.654935] ------------[ cut here ]------------ 2025-05-15T10:22:24.989181+00:00 <4> kernel - [7144513.654941] refcount_t: underflow; use-after-free. 2025-05-15T10:22:24.989186+00:00 <4> kernel - [7144513.654963] WARNING: CPU: 71 PID: 2010514 at lib/refcount.c:28 refcount_warn_saturate+0xf7/0x150 ... ... 2025-05-15T10:22:24.989221+00:00 <4> kernel - [7144513.655146] RIP: 0010:refcount_warn_saturate+0xf7/0x150 2025-05-15T10:22:24.989229+00:00 <4> kernel - [7144513.655149] Code: eb 9e 0f b6 1d 12 2b b9 01 80 fb 01 0f 87 10 92 6f 00 83 e3 01 75 89 48 c7 c7 30 10 e4 8e c6 05 f6 2a b9 01 01 e8 b3 0b 6c 00 <0f> 0b e9 6f ff ff ff 0f b6 1d e1 2a b9 01 80 fb 01 0f 87 cd 91 6f 2025-05-15T10:22:24.989231+00:00 <4> kernel - [7144513.655153] RSP: 0018:ff297e72f41e7d48 EFLAGS: 00010286 2025-05-15T10:22:24.989232+00:00 <4> kernel - [7144513.655156] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 2025-05-15T10:22:24.989233+00:00 <4> kernel - [7144513.655158] RDX: ff249fc37fae0588 RSI: 0000000000000001 RDI: ff249fc37fae0580 2025-05-15T10:22:24.989234+00:00 <4> kernel - [7144513.655160] RBP: ff297e72f41e7d50 R08: 0000000000000003 R09: fffffffffff6bb28 2025-05-15T10:22:24.989235+00:00 <4> kernel - [7144513.655162] R10: ff249fc37f16bb30 R11: 0000000000000001 R12: ff249fb2124ba380 2025-05-15T10:22:24.989239+00:00 <4> kernel - [7144513.655164] R13: ff249fb2124ba3b8 R14: 00000000000000e5 R15: ff249f4589a40000 2025-05-15T10:22:24.989240+00:00 <4> kernel - [7144513.655166] FS: 0000000000000000(0000) GS:ff249fc37fac0000(0000) knlGS:0000000000000000 2025-05-15T10:22:24.989241+00:00 <4> kernel - [7144513.655169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2025-05-15T10:22:24.989242+00:00 <4> kernel - [7144513.655171] CR2: 00007f02a7291000 CR3: 000000413b00e006 CR4: 0000000000771ee0 2025-05-15T10:22:24.989243+00:00 <4> kernel - [7144513.655173] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2025-05-15T10:22:24.989244+00:00 <4> kernel - [7144513.655175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 2025-05-15T10:22:24.989247+00:00 <4> kernel - [7144513.655177] PKRU: 55555554 2025-05-15T10:22:24.989248+00:00 <4> kernel - [7144513.655178] Call Trace: 2025-05-15T10:22:24.989248+00:00 <4> kernel - [7144513.655181] <TASK> 2025-05-15T10:22:24.989249+00:00 <4> kernel - [7144513.655185] ? show_trace_log_lvl+0x1d6/0x2ea 2025-05-15T10:22:24.989250+00:00 <4> kernel - [7144513.655193] ? show_trace_log_lvl+0x1d6/0x2ea 2025-05-15T10:22:24.989251+00:00 <4> kernel - [7144513.655197] ? nfsd_file_put+0x10f/0x170 [nfsd] 2025-05-15T10:22:24.989251+00:00 <4> kernel - [7144513.655255] ? show_regs.part.0+0x23/0x29 2025-05-15T10:22:24.989255+00:00 <4> kernel - [7144513.655258] ? show_regs.cold+0x8/0xd 2025-05-15T10:22:24.989255+00:00 <4> kernel - [7144513.655260] ? refcount_warn_saturate+0xf7/0x150 2025-05-15T10:22:24.989256+00:00 <4> kernel - [7144513.655263] ? __warn+0x8c/0x100 2025-05-15T10:22:24.989257+00:00 <4> kernel - [7144513.655268] ? refcount_warn_saturate+0xf7/0x150 2025-05-15T10:22:24.989257+00:00 <4> kernel - [7144513.655270] ? report_bug+0xa4/0xd0 2025-05-15T10:22:24.989258+00:00 <4> kernel - [7144513.655275] ? handle_bug+0x39/0x90 2025-05-15T10:22:24.989261+00:00 <4> kernel - [7144513.655279] ? exc_invalid_op+0x19/0x70 2025-05-15T10:22:24.989262+00:00 <4> kernel - [7144513.655281] ? asm_exc_invalid_op+0x1b/0x20 2025-05-15T10:22:24.989262+00:00 <4> kernel - [7144513.655287] ? refcount_warn_saturate+0xf7/0x150 2025-05-15T10:22:24.989263+00:00 <4> kernel - [7144513.655290] ? refcount_warn_saturate+0xf7/0x150 2025-05-15T10:22:24.989264+00:00 <4> kernel - [7144513.655292] nfsd_file_put+0x10f/0x170 [nfsd] 2025-05-15T10:22:24.989265+00:00 <4> kernel - [7144513.655326] nfsd_write+0x8a/0x140 [nfsd] 2025-05-15T10:22:24.989267+00:00 <4> kernel - [7144513.655361] nfsd3_proc_write+0xe0/0x170 [nfsd] 2025-05-15T10:22:24.989268+00:00 <4> kernel - [7144513.655395] nfsd_dispatch+0x173/0x270 [nfsd] 2025-05-15T10:22:24.989268+00:00 <4> kernel - [7144513.655427] svc_process_common+0x3d7/0x720 [sunrpc] 2025-05-15T10:22:24.989269+00:00 <4> kernel - [7144513.655484] ? nfsd_svc+0x200/0x200 [nfsd] 2025-05-15T10:22:24.989270+00:00 <4> kernel - [7144513.655518] svc_process+0xbc/0x100 [sunrpc] 2025-05-15T10:22:24.989270+00:00 <4> kernel - [7144513.655561] nfsd+0xd3/0x100 [nfsd] 2025-05-15T10:22:24.989271+00:00 <4> kernel - [7144513.655590] ? trace_event_raw_event_nfsd_export_update+0x190/0x190 [nfsd] 2025-05-15T10:22:24.989272+00:00 <4> kernel - [7144513.655619] kthread+0x127/0x150 2025-05-15T10:22:24.989273+00:00 <4> kernel - [7144513.655624] ? set_kthread_struct+0x50/0x50 2025-05-15T10:22:24.989274+00:00 <4> kernel - [7144513.655628] ret_from_fork+0x1f/0x30 2025-05-15T10:22:24.989275+00:00 <4> kernel - [7144513.655636] </TASK> 2025-05-15T10:22:24.989276+00:00 <4> kernel - [7144513.655637] ---[ end trace 2c2d196f39b3f573 ]--- 2025-05-15T10:22:25.000964+00:00 <3> kernel - [7144513.668313] VFS: Close: file count is 0 2025-05-15T10:22:25.000984+00:00 <3> kernel - [7144513.668317] VFS: Close: file count is 0 2025-05-15T10:22:25.000986+00:00 <3> kernel - [7144513.668318] VFS: Close: file count is 0 - In syslog in addition to the flood of "VFS: Close: file count is 0" messages, i am also seeing the following stack traces - ======= Stack trace 1 ======= kernel - [7144714.467166] PKRU: 55555554 kernel - [7144714.467167] Call Trace: kernel - [7144714.467169] <IRQ> kernel - [7144714.467172] ? show_trace_log_lvl+0x1d6/0x2ea kernel - [7144714.467177] ? show_trace_log_lvl+0x1d6/0x2ea kernel - [7144714.467181] ? nfsd_file_free+0x14/0x130 [nfsd] kernel - [7144714.467213] ? show_regs.part.0+0x23/0x29 kernel - [7144714.467216] ? show_regs.cold+0x8/0xd kernel - [7144714.467219] ? watchdog_timer_fn+0x1be/0x220 kernel - [7144714.467223] ? lockup_detector_update_enable+0x60/0x60 kernel - [7144714.467226] ? __hrtimer_run_queues+0x104/0x230 kernel - [7144714.467230] ? clockevents_program_event+0xaa/0x130 kernel - [7144714.467234] ? hrtimer_interrupt+0x101/0x220 kernel - [7144714.467237] ? __sysvec_apic_timer_interrupt+0x5e/0xe0 kernel - [7144714.467241] ? sysvec_apic_timer_interrupt+0x7b/0x90 kernel - [7144714.467244] </IRQ> kernel - [7144714.467244] <TASK> kernel - [7144714.467245] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 kernel - [7144714.467249] ? read_tsc+0x3/0x20 kernel - [7144714.467252] ? ktime_get+0x43/0xc0 kernel - [7144714.467255] nfsd_file_free+0x14/0x130 [nfsd] kernel - [7144714.467274] __nfsd_file_cache_purge+0xed/0x150 [nfsd] kernel - [7144714.467291] nfsd_file_cache_purge+0x1e/0x30 [nfsd] kernel - [7144714.467311] expkey_flush+0x2e/0x40 [nfsd] kernel - [7144714.467330] write_flush.constprop.0+0x102/0x160 [sunrpc] kernel - [7144714.467374] write_flush_procfs+0x32/0x40 [sunrpc] kernel - [7144714.467403] proc_reg_write+0x5b/0xa0 kernel - [7144714.467408] ? __cond_resched+0x1a/0x50 kernel - [7144714.467411] vfs_write+0xc4/0x270 kernel - [7144714.467415] ksys_write+0x67/0xf0 kernel - [7144714.467419] __x64_sys_write+0x19/0x20 - ======= Stack trace 2 ======= kernel - [7144854.723115] Stopper: multi_cpu_stop+0x0/0x120 <- migrate_swap+0xab/0xf0 kernel - [7144854.723132] RIP: 0010:multi_cpu_stop+0xa1/0x120 kernel - [7144854.723137] Code: ff 0c 24 75 0f 8b 43 20 8b 4b 10 83 c0 01 89 4b 24 89 43 20 e8 60 11 fa ff 41 83 ff 04 74 34 45 89 fd 4c 89 f7 e8 4f ff ff ff <44> 8b 7b 20 45 39 fd 75 aa 41 83 ff 01 76 0a e8 0b 14 02 00 e8 36 kernel - [7144854.723143] RSP: 0000:ff297e72cdd3fe68 EFLAGS: 00000202 kernel - [7144854.723148] kernel - [7144854.723151] RAX: 0000000061b7a217 RBX: ff297e72f1187af8 RCX: ff249fc37fd63370 kernel - [7144854.723157] RDX: 0000000061b7a215 RSI: 0000000000000282 RDI: ffffffff8ea1c8a0 kernel - [7144854.723163] RBP: ff297e72cdd3fea0 R08: 0000000000000000 R09: 0000000000000000 kernel - [7144854.723167] R10: 0000000000000001 R11: 0000000000000000 R12: ff297e72f1187b1c kernel - [7144854.723171] R13: 0000000000000001 R14: ffffffff8ea1c8a0 R15: 0000000000000001 kernel - [7144854.723175] FS: 0000000000000000(0000) GS:ff249fc37fd40000(0000) knlGS:0000000000000000 kernel - [7144854.723181] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel - [7144854.723186] CR2: 00007f8cf5ece9c0 CR3: 0000001d19610005 CR4: 0000000000771ee0 kernel - [7144854.723191] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel - [7144854.723197] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel - [7144854.723200] PKRU: 55555554 kernel - [7144854.723205] Call Trace: kernel - [7144854.723208] <IRQ> kernel - [7144854.723214] ? show_trace_log_lvl+0x1d6/0x2ea kernel - [7144854.723223] ? show_trace_log_lvl+0x1d6/0x2ea kernel - [7144854.723228] ? cpu_stopper_thread+0xd3/0x170 kernel - [7144854.723234] ? show_regs.part.0+0x23/0x29 kernel - [7144854.723239] ? show_regs.cold+0x8/0xd kernel - [7144854.723245] ? watchdog_timer_fn+0x1be/0x220 kernel - [7144854.723253] ? lockup_detector_update_enable+0x60/0x60 kernel - [7144854.723260] ? __hrtimer_run_queues+0x104/0x230 kernel - [7144854.723267] ? clockevents_program_event+0xaa/0x130 kernel - [7144854.723276] ? hrtimer_interrupt+0x101/0x220 kernel - [7144854.723282] ? __sysvec_apic_timer_interrupt+0x5e/0xe0 kernel - [7144854.723289] ? sysvec_apic_timer_interrupt+0x7b/0x90 kernel - [7144854.723297] </IRQ> kernel - [7144854.723299] <TASK> kernel - [7144854.723302] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 kernel - [7144854.723310] ? multi_cpu_stop+0xa1/0x120 kernel - [7144854.723316] ? multi_cpu_stop+0xa1/0x120 kernel - [7144854.723321] ? stop_machine_yield+0x10/0x10 kernel - [7144854.723326] cpu_stopper_thread+0xd3/0x170 kernel - [7144854.723331] smpboot_thread_fn+0xb7/0x160 kernel - [7144854.723338] ? smpboot_register_percpu_thread+0x140/0x140 kernel - [7144854.723345] kthread+0x127/0x150 kernel - [7144854.723353] ? set_kthread_struct+0x50/0x50 kernel - [7144854.723360] ret_from_fork+0x1f/0x30 kernel - [7144854.723368] </TASK> - ======= Stack trace 3 ======= kernel - [7144937.588413] systemd[1]: Stopping Grafana Tempo tracing service... kernel - [7144933.695807] rcu: INFO: rcu_sched self-detected stall on CPU kernel - [7144933.695809] rcu: 32-....: (104980 ticks this GP) idle=b27/1/0x4000000000000000 softirq=821553683/821553686 fqs=42983 kernel - [7144933.695818] (t=105009 jiffies g=1271480909 q=393063079) kernel - [7144933.695822] NMI backtrace for cpu 32 kernel - [7144933.695828] Call Trace: kernel - [7144933.695829] <IRQ> kernel - [7144933.695832] show_stack+0x52/0x5c kernel - [7144933.695839] dump_stack_lvl+0x4a/0x63 kernel - [7144933.695842] dump_stack+0x10/0x16 kernel - [7144933.695844] nmi_cpu_backtrace.cold+0x4d/0x93 kernel - [7144933.695846] ? lapic_can_unplug_cpu+0x90/0x90 kernel - [7144933.695850] nmi_trigger_cpumask_backtrace+0xec/0x100 kernel - [7144933.695854] arch_trigger_cpumask_backtrace+0x19/0x20 kernel - [7144933.695856] trigger_single_cpu_backtrace+0x44/0x4f kernel - [7144933.695859] rcu_dump_cpu_stacks+0x102/0x149 kernel - [7144933.695862] print_cpu_stall.cold+0x2f/0xe2 kernel - [7144933.695865] check_cpu_stall+0x1d8/0x270 kernel - [7144933.695869] rcu_sched_clock_irq+0x9a/0x250 kernel - [7144933.695871] update_process_times+0x94/0xd0 kernel - [7144933.695874] tick_sched_handle+0x29/0x70 kernel - [7144933.695876] tick_sched_timer+0x6f/0x90 kernel - [7144933.695878] ? tick_sched_do_timer+0xa0/0xa0 kernel - [7144933.695880] __hrtimer_run_queues+0x104/0x230 kernel - [7144933.695882] ? clockevents_program_event+0xaa/0x130 kernel - [7144933.695886] hrtimer_interrupt+0x101/0x220 kernel - [7144933.695889] __sysvec_apic_timer_interrupt+0x5e/0xe0 kernel - [7144933.695892] sysvec_apic_timer_interrupt+0x7b/0x90 kernel - [7144933.695896] </IRQ> kernel - [7144933.695897] <TASK> kernel - [7144933.695899] asm_sysvec_apic_timer_interrupt+0x1b/0x20 kernel - [7144933.695904] RIP: 0010:desc_read_finalized_seq+0x63/0xa0 kernel - [7144933.695908] Code: 01 76 43 83 f8 01 74 3e 4c 39 65 e0 75 38 83 f8 03 74 3a 31 c0 48 83 7b 08 01 75 0e 31 c0 48 83 7b 10 01 0f 95 c0 8d 44 00 fe <48> 8b 55 e8 65 48 2b 14 25 28 00 00 00 75 1b 48 83 c4 10 5b 41 5c
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2111213 Title: System hang due refcount_t underflow issue and VFS: Close: file count is 0 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2111213/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
