Bun and Kinga,

I need you to identify the exact patches with patch ID's from upstream needed 
from 6.11 to even consider this.  Once I see those and if it doesn't take a 
major effort to backport them to 6.8 then we can consider it.  Otherwise you 
may have to use the hwe kernel when it reaches 6.11.  Lets start with the exact 
patch set in addition the fix you already identified.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2079038

Title:
  [VROC] [Ub 24.04.0/1] Kernel bug and reboot hang during recovery -
  missing patch

Status in linux package in Ubuntu:
  New

Bug description:
  Steps to reproduce:
  1. Create RAID array (example for IMSM):
  mdadm -CR imsm -e imsm -n2 /dev/nvme[01]n1
  mdadm -CR vol1 -l1 -n2 /dev/nvme[01]n1

  2. Hot-remove one of the drive.
  3. Insert drive back - recovery should start.
  4. Reboot platform

  Expected: reboot is performed successfully. Recovery is continued.
  Actual: reboot hanged and call trace appeared.

  
  [  776.416504] (sd-umoun[3359]: Failed to unmount 
/run/shutdown/mounts/bd36b757b23bbc6f: Device or resource busy

  [  776.445568] shutdown[1]: Could not stop MD /dev/md126: Device or
  resource busy

  mdadm: Cannot get exclusive access to /dev/md126:Perhaps a running
  process, mounted filesystem or active volume group?

  mdadm: Cannot stop container /dev/md127: member md126 still active

  mdadm: Cannot get exclusive access to /dev/md126:Perhaps a running
  process, mounted filesystem or active volume group?

  mdadm: Cannot st[  784.636276] (sd-exec-[3360]:
  /usr/lib/systemd/system-shutdown/mdadm.finalrd failed with exit status
  1.

  op container /dev/md127: member [  784.647309] shutdown[1]: Unable to
  finalize remaining file systems, MD devices, ignoring.

  md126 still active

  [  986.549746] INFO: task shutdown:1 blocked for more than 122
  seconds.

  [  986.556158]       Not tainted 6.8.0-31-generic #31-Ubuntu

  [  986.561586] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
  disables this message.

  [  986.569440] task:shutdown        state:D stack:0     pid:1
  tgid:1     ppid:0      flags:0x00004002

  [  986.578769] Call Trace:

  [  986.581236]  <TASK>

  [  986.583352]  __schedule+0x27c/0x6b0

  [  986.586864]  schedule+0x33/0x110

  [  986.590111]  stop_sync_thread+0x135/0x1b0

  [  986.594143]  ? __pfx_autoremove_wake_function+0x10/0x10

  [  986.599388]  __md_stop_writes+0x19/0xf0

  [  986.603240]  md_notify_reboot+0x93/0x160

  [  986.607186]  notifier_call_chain+0x5e/0xe0

  [  986.611301]  blocking_notifier_call_chain+0x41/0x70

  [  986.616201]  kernel_restart+0x21/0xa0

  [  986.619881]  __do_sys_reboot+0x156/0x250

  [  986.623824]  __x64_sys_reboot+0x1b/0x30

  [  986.627680]  x64_sys_call+0x223c/0x25c0

  [  986.631535]  do_syscall_64+0x7f/0x180

  [  986.635218]  ? irqentry_exit+0x43/0x50

  [  986.638990]  ? exc_page_fault+0x94/0x1b0

  [  986.642932]  entry_SYSCALL_64_after_hwframe+0x73/0x7b

  [  986.648003] RIP: 0033:0x7da6531dea07

  [  986.651623] RSP: 002b:00007fff2babce18 EFLAGS: 00000246 ORIG_RAX:
  00000000000000a9

  [  986.659211] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
  00007da6531dea07

  [  986.666362] RDX: 0000000001234567 RSI: 0000000028121969 RDI:
  00000000fee1dead

  [  986.673509] RBP: 00007fff2babd050 R08: 0000000000000069 R09:
  0000000000000000

  [  986.680662] R10: 0000000000000000 R11: 0000000000000246 R12:
  0000000000000001

  [  986.687808] R13: 0000000000000000 R14: 0000000000000000 R15:
  0000000001234567

  [  986.694956]  </TASK>

  [  986.697665] Kernel panic - not syncing: hung_task: blocked tasks

  [  986.703732] CPU: 0 PID: 982 Comm: khungtaskd Not tainted
  6.8.0-31-generic #31-Ubuntu

  [  986.711510] Hardware name: Intel Corporation WilsonCity/WilsonCity,
  BIOS WLYDCRB1.SYS.0020.P84.2103030140 03/03/2021

  [  986.722061] Call Trace:

  [  986.724529]  <TASK>

  [  986.726639]  dump_stack_lvl+0x48/0x70

  [  986.730326]  dump_stack+0x10/0x20

  [  986.733661]  panic+0x35f/0x3c0

  [  986.736738]  check_hung_uninterruptible_tasks+0x279/0x320

  [  986.742157]  ? __pfx_watchdog+0x10/0x10

  [  986.746011]  watchdog+0xad/0xb0

  [  986.749164]  kthread+0xef/0x120

  [  986.752317]  ? __pfx_kthread+0x10/0x10

  [  986.756080]  ret_from_fork+0x44/0x70

  [  986.760081]  ? __pfx_kthread+0x10/0x10

  [  986.764201]  ret_from_fork_asm+0x1b/0x30

  [  986.768473]  </TASK>

  [  986.771197] Kernel Offset: 0x1c00000 from 0xffffffff81000000
  (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

  [  986.913304] ---[ end Kernel panic - not syncing: hung_task: blocked
  tasks ]---

  
  This issue is fixed in md. Please apply patch: 
https://github.com/torvalds/linux/commit/1ddeeb2a058d7b2a58ed9e820396b4ceb715d529
 

  But customer also see other issue and reboot is delayed with this
  patch (not hanged). I cannot reproduce it on my platform, but I know
  that newer kernel (6.11-rc4) fixes his all issues.

  If possible rebase to the 6.11-rc4 
(https://kernel.ubuntu.com/mainline/v6.11-rc4/ )
  because there all of the customers issues are fixed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2079038/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to