Public bug reported:

Steps to reproduce:
1. Create RAID array (example for IMSM):
mdadm -CR imsm -e imsm -n2 /dev/nvme[01]n1
mdadm -CR vol1 -l1 -n2 /dev/nvme[01]n1

2. Hot-remove one of the drive.
3. Insert drive back - recovery should start.
4. Reboot platform

Expected: reboot is performed successfully. Recovery is continued.
Actual: reboot hanged and call trace appeared.


[  776.416504] (sd-umoun[3359]: Failed to unmount 
/run/shutdown/mounts/bd36b757b23bbc6f: Device or resource busy

[  776.445568] shutdown[1]: Could not stop MD /dev/md126: Device or
resource busy

mdadm: Cannot get exclusive access to /dev/md126:Perhaps a running
process, mounted filesystem or active volume group?

mdadm: Cannot stop container /dev/md127: member md126 still active

mdadm: Cannot get exclusive access to /dev/md126:Perhaps a running
process, mounted filesystem or active volume group?

mdadm: Cannot st[  784.636276] (sd-exec-[3360]: /usr/lib/systemd/system-
shutdown/mdadm.finalrd failed with exit status 1.

op container /dev/md127: member [  784.647309] shutdown[1]: Unable to
finalize remaining file systems, MD devices, ignoring.

md126 still active

[  986.549746] INFO: task shutdown:1 blocked for more than 122 seconds.

[  986.556158]       Not tainted 6.8.0-31-generic #31-Ubuntu

[  986.561586] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

[  986.569440] task:shutdown        state:D stack:0     pid:1     tgid:1
ppid:0      flags:0x00004002

[  986.578769] Call Trace:

[  986.581236]  <TASK>

[  986.583352]  __schedule+0x27c/0x6b0

[  986.586864]  schedule+0x33/0x110

[  986.590111]  stop_sync_thread+0x135/0x1b0

[  986.594143]  ? __pfx_autoremove_wake_function+0x10/0x10

[  986.599388]  __md_stop_writes+0x19/0xf0

[  986.603240]  md_notify_reboot+0x93/0x160

[  986.607186]  notifier_call_chain+0x5e/0xe0

[  986.611301]  blocking_notifier_call_chain+0x41/0x70

[  986.616201]  kernel_restart+0x21/0xa0

[  986.619881]  __do_sys_reboot+0x156/0x250

[  986.623824]  __x64_sys_reboot+0x1b/0x30

[  986.627680]  x64_sys_call+0x223c/0x25c0

[  986.631535]  do_syscall_64+0x7f/0x180

[  986.635218]  ? irqentry_exit+0x43/0x50

[  986.638990]  ? exc_page_fault+0x94/0x1b0

[  986.642932]  entry_SYSCALL_64_after_hwframe+0x73/0x7b

[  986.648003] RIP: 0033:0x7da6531dea07

[  986.651623] RSP: 002b:00007fff2babce18 EFLAGS: 00000246 ORIG_RAX:
00000000000000a9

[  986.659211] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
00007da6531dea07

[  986.666362] RDX: 0000000001234567 RSI: 0000000028121969 RDI:
00000000fee1dead

[  986.673509] RBP: 00007fff2babd050 R08: 0000000000000069 R09:
0000000000000000

[  986.680662] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000000001

[  986.687808] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000001234567

[  986.694956]  </TASK>

[  986.697665] Kernel panic - not syncing: hung_task: blocked tasks

[  986.703732] CPU: 0 PID: 982 Comm: khungtaskd Not tainted
6.8.0-31-generic #31-Ubuntu

[  986.711510] Hardware name: Intel Corporation WilsonCity/WilsonCity,
BIOS WLYDCRB1.SYS.0020.P84.2103030140 03/03/2021

[  986.722061] Call Trace:

[  986.724529]  <TASK>

[  986.726639]  dump_stack_lvl+0x48/0x70

[  986.730326]  dump_stack+0x10/0x20

[  986.733661]  panic+0x35f/0x3c0

[  986.736738]  check_hung_uninterruptible_tasks+0x279/0x320

[  986.742157]  ? __pfx_watchdog+0x10/0x10

[  986.746011]  watchdog+0xad/0xb0

[  986.749164]  kthread+0xef/0x120

[  986.752317]  ? __pfx_kthread+0x10/0x10

[  986.756080]  ret_from_fork+0x44/0x70

[  986.760081]  ? __pfx_kthread+0x10/0x10

[  986.764201]  ret_from_fork_asm+0x1b/0x30

[  986.768473]  </TASK>

[  986.771197] Kernel Offset: 0x1c00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)

[  986.913304] ---[ end Kernel panic - not syncing: hung_task: blocked
tasks ]---


This issue is fixed in md. Please apply patch: 
https://github.com/torvalds/linux/commit/1ddeeb2a058d7b2a58ed9e820396b4ceb715d529
 

But customer also see other issue and reboot is delayed with this patch
(not hanged). I cannot reproduce it on my platform, but I know that
newer kernel (6.11-rc4) fixes his all issues.

If possible rebase to the 6.11-rc4 
(https://kernel.ubuntu.com/mainline/v6.11-rc4/ )
because there all of the customers issues are fixed.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2079038

Title:
  [VROC] [Ub 24.04.0/1] Kernel bug and reboot hang during recovery -
  missing patch

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2079038/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to