Public bug reported:

Steps:

Connect the hosts to the NVMeOF enclosure and run the IOs on the drives.

Observation :

Observing the IO Tool process got blocked continuously, followed by all
drives connected.

*From Syslog :*
{noformat}
2025-05-18T02:49:10.268427+05:30 blr-r29-26u kernel: INFO: task maim:1009904 
blocked for more than 122 seconds.
2025-05-18T02:49:10.268464+05:30 blr-r29-26u kernel:       Tainted: P           
O       6.8.0-55-generic #57-Ubuntu
2025-05-18T02:49:10.268490+05:30 blr-r29-26u kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
2025-05-18T02:49:10.270665+05:30 blr-r29-26u kernel: task:maim            
state:D stack:0     pid:1009904 tgid:1004274 ppid:975377 flags:0x00004002
2025-05-18T02:49:10.270691+05:30 blr-r29-26u kernel: Call Trace:
2025-05-18T02:49:10.270695+05:30 blr-r29-26u kernel:  <TASK>
2025-05-18T02:49:10.270698+05:30 blr-r29-26u kernel:  __schedule+0x27c/0x6b0
2025-05-18T02:49:10.270700+05:30 blr-r29-26u kernel:  schedule+0x33/0x110
2025-05-18T02:49:10.270703+05:30 blr-r29-26u kernel:  io_schedule+0x46/0x80
2025-05-18T02:49:10.270709+05:30 blr-r29-26u kernel:  
folio_wait_bit_common+0x136/0x330
2025-05-18T02:49:10.270713+05:30 blr-r29-26u kernel:  ? 
__pfx_wake_page_function+0x10/0x10
2025-05-18T02:49:10.270715+05:30 blr-r29-26u kernel:  folio_wait_bit+0x18/0x30
2025-05-18T02:49:10.270718+05:30 blr-r29-26u kernel:  
folio_wait_writeback+0x2b/0xa0
2025-05-18T02:49:10.270720+05:30 blr-r29-26u kernel:  
__filemap_fdatawait_range+0x93/0x110
2025-05-18T02:49:10.270723+05:30 blr-r29-26u kernel:  
file_write_and_wait_range+0x93/0xc0
2025-05-18T02:49:10.270727+05:30 blr-r29-26u kernel:  ext4_sync_file+0x8d/0x380
2025-05-18T02:49:10.270730+05:30 blr-r29-26u kernel:  vfs_fsync_range+0x4b/0xa0
2025-05-18T02:49:10.270733+05:30 blr-r29-26u kernel:  ? __fdget+0xc7/0xf0
2025-05-18T02:49:10.270736+05:30 blr-r29-26u kernel:  __x64_sys_fsync+0x3c/0x70
2025-05-18T02:49:10.270738+05:30 blr-r29-26u kernel:  x64_sys_call+0x2550/0x25a0
2025-05-18T02:49:10.270741+05:30 blr-r29-26u kernel:  do_syscall_64+0x7f/0x180
2025-05-18T02:49:10.270745+05:30 blr-r29-26u kernel:  ? 
filemap_get_entry+0xe5/0x160
2025-05-18T02:49:10.270748+05:30 blr-r29-26u kernel:  ? 
__block_commit_write+0x82/0xc0
2025-05-18T02:49:10.270752+05:30 blr-r29-26u kernel:  ? 
block_write_end+0x4a/0xd0
2025-05-18T02:49:10.270756+05:30 blr-r29-26u kernel:  ? 
copy_page_from_iter_atomic+0xed/0x690
2025-05-18T02:49:10.270758+05:30 blr-r29-26u kernel:  ? 
radix_tree_lookup+0xd/0x20
2025-05-18T02:49:10.270760+05:30 blr-r29-26u kernel:  ? 
balance_dirty_pages_ratelimited_flags+0x140/0x3b0
2025-05-18T02:49:10.270763+05:30 blr-r29-26u kernel:  ? 
balance_dirty_pages_ratelimited+0x10/0x20
2025-05-18T02:49:10.270765+05:30 blr-r29-26u kernel:  ? 
generic_perform_write+0x155/0x230
2025-05-18T02:49:10.270766+05:30 blr-r29-26u kernel:  ? vfs_write+0x325/0x480
2025-05-18T02:49:10.270768+05:30 blr-r29-26u kernel:  ? vfs_write+0x325/0x480
2025-05-18T02:49:10.270773+05:30 blr-r29-26u kernel:  ? __f_unlock_pos+0x12/0x20
2025-05-18T02:49:10.270774+05:30 blr-r29-26u kernel:  ? ksys_write+0xe6/0x100
2025-05-18T02:49:10.270776+05:30 blr-r29-26u kernel:  ? 
syscall_exit_to_user_mode+0x86/0x260
2025-05-18T02:49:10.270779+05:30 blr-r29-26u kernel:  ? do_syscall_64+0x8c/0x180
2025-05-18T02:49:10.270780+05:30 blr-r29-26u kernel:  ? 
irqentry_exit_to_user_mode+0x7b/0x260
2025-05-18T02:49:10.270781+05:30 blr-r29-26u kernel:  ? irqentry_exit+0x43/0x50
2025-05-18T02:49:10.270783+05:30 blr-r29-26u kernel:  ? 
common_interrupt+0x54/0xb0
2025-05-18T02:49:10.270785+05:30 blr-r29-26u kernel:  
entry_SYSCALL_64_after_hwframe+0x78/0x80
2025-05-18T02:49:10.270789+05:30 blr-r29-26u kernel: RIP: 0033:0x7207f251ee0c
2025-05-18T02:49:10.270791+05:30 blr-r29-26u kernel: RSP: 002b:000072068b7fc3c0 
EFLAGS: 00000293 ORIG_RAX: 000000000000004a
2025-05-18T02:49:10.270793+05:30 blr-r29-26u kernel: RAX: ffffffffffffffda RBX: 
000072068b7fc440 RCX: 00007207f251ee0c
2025-05-18T02:49:10.270797+05:30 blr-r29-26u kernel: RDX: 0000000000000000 RSI: 
0000000000000000 RDI: 0000000000000308
2025-05-18T02:49:10.270799+05:30 blr-r29-26u kernel: RBP: 000072068b7fc3d0 R08: 
0000000000000000 R09: 000072068b7fc440
2025-05-18T02:49:10.270803+05:30 blr-r29-26u kernel: R10: 00000000196e4390 R11: 
0000000000000293 R12: 000072068b7fc4f0
2025-05-18T02:49:10.270806+05:30 blr-r29-26u kernel: R13: 00000000196e4390 R14: 
00000000196e4398 R15: 000000000000004f
2025-05-18T02:49:10.270811+05:30 blr-r29-26u kernel:  </TASK>
{noformat}
After continuous maim process block able to see network config got reset which 
leads to loss of RNIC IPs.

*From Syslog :*
{noformat}
2025-05-18T03:01:44.164571+05:30 blr-r29-26u kernel: systemd[1]: 
systemd-journald.service: Main process exited, code=killed, status=9/KILL
2025-05-18T03:01:05.264346+05:30 blr-r29-26u systemd[1]: 
systemd-networkd.service: Failed with result 'timeout'.
2025-05-18T03:01:05.264988+05:30 blr-r29-26u systemd[1]: Failed to start 
systemd-networkd.service - Network Configuration.
2025-05-18T03:01:05.266544+05:30 blr-r29-26u systemd[1]: 
systemd-networkd.service: Scheduled restart job, restart counter is at 2.
2025-05-18T03:01:05.266758+05:30 blr-r29-26u systemd[1]: 
netplan-ovs-cleanup.service - OpenVSwitch configuration for cleanup was skipped 
because of an unmet condition check 
(ConditionFileIsExecutable=/usr/bin/ovs-vsctl).
2025-05-18T03:01:05.290811+05:30 blr-r29-26u systemd[1]: Starting 
systemd-networkd.service - Network Configuration...
2025-05-18T03:01:05.418246+05:30 blr-r29-26u systemd-networkd[1010564]: lo: 
Link UP
2025-05-18T03:01:05.418503+05:30 blr-r29-26u systemd-networkd[1010564]: lo: 
Gained carrier
2025-05-18T03:01:05.418725+05:30 blr-r29-26u systemd-networkd[1010564]: eno1: 
Link UP
2025-05-18T03:01:44.164663+05:30 blr-r29-26u kernel: systemd[1]: 
systemd-journald.service: Failed with result 'timeout'.
2025-05-18T03:01:44.164665+05:30 blr-r29-26u kernel: systemd[1]: Failed to 
start systemd-journald.service - Journal Service.
2025-05-18T03:01:44.164667+05:30 blr-r29-26u kernel: systemd[1]: 
systemd-journald.service: Scheduled restart job, restart counter is at 2.
2025-05-18T03:01:05.422331+05:30 blr-r29-26u systemd-networkd[1010564]: eno2: 
Link UP
2025-05-18T03:01:44.164671+05:30 blr-r29-26u kernel: systemd[1]: Starting 
systemd-journald.service - Journal Service...
2025-05-18T03:01:44.164672+05:30 blr-r29-26u kernel: systemd-journald[1010540]: 
Collecting audit messages is disabled.
2025-05-18T03:01:44.164674+05:30 blr-r29-26u kernel: systemd-journald[1010540]: 
File /var/log/journal/7c9e984161af4278a958cf98d0e66061/system.journal corrupted 
or uncleanly shut down, renaming and replacing.
2025-05-18T03:01:05.423549+05:30 blr-r29-26u systemd-networkd[1010564]: eno2: 
Gained carrier
2025-05-18T03:01:44.164676+05:30 blr-r29-26u kernel: systemd[1]: 
systemd-networkd.service: State 'final-sigterm' timed out. Killing.
2025-05-18T03:01:44.164684+05:30 blr-r29-26u kernel: systemd[1]: 
systemd-networkd.service: Killing process 1010382 (systemd-network) with signal 
SIGKILL.
2025-05-18T03:01:05.426972+05:30 blr-r29-26u systemd-networkd[1010564]: eno3: 
Link UP
2025-05-18T03:01:44.164685+05:30 blr-r29-26u kernel: systemd[1]: snapd.service: 
Processes still around after SIGKILL. Ignoring.
2025-05-18T03:01:05.430531+05:30 blr-r29-26u systemd-networkd[1010564]: eno4: 
Link UP
2025-05-18T03:01:44.164697+05:30 blr-r29-26u kernel: systemd[1]: Started 
systemd-journald.service - Journal Service.
2025-05-18T03:01:44.164706+05:30 blr-r29-26u kernel: nvme nvme42: I/O tag 0 
(e000) type 4 opcode 0x18 (Admin Cmd) QID 0 timeout
2025-05-18T03:01:44.164707+05:30 blr-r29-26u kernel: nvme nvme42: starting 
error recovery
2025-05-18T03:01:05.432624+05:30 blr-r29-26u systemd-networkd[1010564]: 
enp130s0f0np0: Link UP
2025-05-18T03:01:44.164714+05:30 blr-r29-26u kernel: nvme nvme42: failed 
nvme_keep_alive_end_io error=10
2025-05-18T03:01:05.432740+05:30 blr-r29-26u systemd-networkd[1010564]: 
enp130s0f0np0: Gained carrier
2025-05-18T03:01:44.164724+05:30 blr-r29-26u kernel: nvme nvme42: Reconnecting 
in 10 seconds...
{noformat}

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2112304

Title:
  All Drives disconnected for both the paths when maim (Medusa IO) is
  blocked continuously followed by automatic network config reset

Status in linux package in Ubuntu:
  New

Bug description:
  Steps:

  Connect the hosts to the NVMeOF enclosure and run the IOs on the
  drives.

  Observation :

  Observing the IO Tool process got blocked continuously, followed by
  all drives connected.

  *From Syslog :*
  {noformat}
  2025-05-18T02:49:10.268427+05:30 blr-r29-26u kernel: INFO: task maim:1009904 
blocked for more than 122 seconds.
  2025-05-18T02:49:10.268464+05:30 blr-r29-26u kernel:       Tainted: P         
  O       6.8.0-55-generic #57-Ubuntu
  2025-05-18T02:49:10.268490+05:30 blr-r29-26u kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  2025-05-18T02:49:10.270665+05:30 blr-r29-26u kernel: task:maim            
state:D stack:0     pid:1009904 tgid:1004274 ppid:975377 flags:0x00004002
  2025-05-18T02:49:10.270691+05:30 blr-r29-26u kernel: Call Trace:
  2025-05-18T02:49:10.270695+05:30 blr-r29-26u kernel:  <TASK>
  2025-05-18T02:49:10.270698+05:30 blr-r29-26u kernel:  __schedule+0x27c/0x6b0
  2025-05-18T02:49:10.270700+05:30 blr-r29-26u kernel:  schedule+0x33/0x110
  2025-05-18T02:49:10.270703+05:30 blr-r29-26u kernel:  io_schedule+0x46/0x80
  2025-05-18T02:49:10.270709+05:30 blr-r29-26u kernel:  
folio_wait_bit_common+0x136/0x330
  2025-05-18T02:49:10.270713+05:30 blr-r29-26u kernel:  ? 
__pfx_wake_page_function+0x10/0x10
  2025-05-18T02:49:10.270715+05:30 blr-r29-26u kernel:  folio_wait_bit+0x18/0x30
  2025-05-18T02:49:10.270718+05:30 blr-r29-26u kernel:  
folio_wait_writeback+0x2b/0xa0
  2025-05-18T02:49:10.270720+05:30 blr-r29-26u kernel:  
__filemap_fdatawait_range+0x93/0x110
  2025-05-18T02:49:10.270723+05:30 blr-r29-26u kernel:  
file_write_and_wait_range+0x93/0xc0
  2025-05-18T02:49:10.270727+05:30 blr-r29-26u kernel:  
ext4_sync_file+0x8d/0x380
  2025-05-18T02:49:10.270730+05:30 blr-r29-26u kernel:  
vfs_fsync_range+0x4b/0xa0
  2025-05-18T02:49:10.270733+05:30 blr-r29-26u kernel:  ? __fdget+0xc7/0xf0
  2025-05-18T02:49:10.270736+05:30 blr-r29-26u kernel:  
__x64_sys_fsync+0x3c/0x70
  2025-05-18T02:49:10.270738+05:30 blr-r29-26u kernel:  
x64_sys_call+0x2550/0x25a0
  2025-05-18T02:49:10.270741+05:30 blr-r29-26u kernel:  do_syscall_64+0x7f/0x180
  2025-05-18T02:49:10.270745+05:30 blr-r29-26u kernel:  ? 
filemap_get_entry+0xe5/0x160
  2025-05-18T02:49:10.270748+05:30 blr-r29-26u kernel:  ? 
__block_commit_write+0x82/0xc0
  2025-05-18T02:49:10.270752+05:30 blr-r29-26u kernel:  ? 
block_write_end+0x4a/0xd0
  2025-05-18T02:49:10.270756+05:30 blr-r29-26u kernel:  ? 
copy_page_from_iter_atomic+0xed/0x690
  2025-05-18T02:49:10.270758+05:30 blr-r29-26u kernel:  ? 
radix_tree_lookup+0xd/0x20
  2025-05-18T02:49:10.270760+05:30 blr-r29-26u kernel:  ? 
balance_dirty_pages_ratelimited_flags+0x140/0x3b0
  2025-05-18T02:49:10.270763+05:30 blr-r29-26u kernel:  ? 
balance_dirty_pages_ratelimited+0x10/0x20
  2025-05-18T02:49:10.270765+05:30 blr-r29-26u kernel:  ? 
generic_perform_write+0x155/0x230
  2025-05-18T02:49:10.270766+05:30 blr-r29-26u kernel:  ? vfs_write+0x325/0x480
  2025-05-18T02:49:10.270768+05:30 blr-r29-26u kernel:  ? vfs_write+0x325/0x480
  2025-05-18T02:49:10.270773+05:30 blr-r29-26u kernel:  ? 
__f_unlock_pos+0x12/0x20
  2025-05-18T02:49:10.270774+05:30 blr-r29-26u kernel:  ? ksys_write+0xe6/0x100
  2025-05-18T02:49:10.270776+05:30 blr-r29-26u kernel:  ? 
syscall_exit_to_user_mode+0x86/0x260
  2025-05-18T02:49:10.270779+05:30 blr-r29-26u kernel:  ? 
do_syscall_64+0x8c/0x180
  2025-05-18T02:49:10.270780+05:30 blr-r29-26u kernel:  ? 
irqentry_exit_to_user_mode+0x7b/0x260
  2025-05-18T02:49:10.270781+05:30 blr-r29-26u kernel:  ? 
irqentry_exit+0x43/0x50
  2025-05-18T02:49:10.270783+05:30 blr-r29-26u kernel:  ? 
common_interrupt+0x54/0xb0
  2025-05-18T02:49:10.270785+05:30 blr-r29-26u kernel:  
entry_SYSCALL_64_after_hwframe+0x78/0x80
  2025-05-18T02:49:10.270789+05:30 blr-r29-26u kernel: RIP: 0033:0x7207f251ee0c
  2025-05-18T02:49:10.270791+05:30 blr-r29-26u kernel: RSP: 
002b:000072068b7fc3c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
  2025-05-18T02:49:10.270793+05:30 blr-r29-26u kernel: RAX: ffffffffffffffda 
RBX: 000072068b7fc440 RCX: 00007207f251ee0c
  2025-05-18T02:49:10.270797+05:30 blr-r29-26u kernel: RDX: 0000000000000000 
RSI: 0000000000000000 RDI: 0000000000000308
  2025-05-18T02:49:10.270799+05:30 blr-r29-26u kernel: RBP: 000072068b7fc3d0 
R08: 0000000000000000 R09: 000072068b7fc440
  2025-05-18T02:49:10.270803+05:30 blr-r29-26u kernel: R10: 00000000196e4390 
R11: 0000000000000293 R12: 000072068b7fc4f0
  2025-05-18T02:49:10.270806+05:30 blr-r29-26u kernel: R13: 00000000196e4390 
R14: 00000000196e4398 R15: 000000000000004f
  2025-05-18T02:49:10.270811+05:30 blr-r29-26u kernel:  </TASK>
  {noformat}
  After continuous maim process block able to see network config got reset 
which leads to loss of RNIC IPs.

  *From Syslog :*
  {noformat}
  2025-05-18T03:01:44.164571+05:30 blr-r29-26u kernel: systemd[1]: 
systemd-journald.service: Main process exited, code=killed, status=9/KILL
  2025-05-18T03:01:05.264346+05:30 blr-r29-26u systemd[1]: 
systemd-networkd.service: Failed with result 'timeout'.
  2025-05-18T03:01:05.264988+05:30 blr-r29-26u systemd[1]: Failed to start 
systemd-networkd.service - Network Configuration.
  2025-05-18T03:01:05.266544+05:30 blr-r29-26u systemd[1]: 
systemd-networkd.service: Scheduled restart job, restart counter is at 2.
  2025-05-18T03:01:05.266758+05:30 blr-r29-26u systemd[1]: 
netplan-ovs-cleanup.service - OpenVSwitch configuration for cleanup was skipped 
because of an unmet condition check 
(ConditionFileIsExecutable=/usr/bin/ovs-vsctl).
  2025-05-18T03:01:05.290811+05:30 blr-r29-26u systemd[1]: Starting 
systemd-networkd.service - Network Configuration...
  2025-05-18T03:01:05.418246+05:30 blr-r29-26u systemd-networkd[1010564]: lo: 
Link UP
  2025-05-18T03:01:05.418503+05:30 blr-r29-26u systemd-networkd[1010564]: lo: 
Gained carrier
  2025-05-18T03:01:05.418725+05:30 blr-r29-26u systemd-networkd[1010564]: eno1: 
Link UP
  2025-05-18T03:01:44.164663+05:30 blr-r29-26u kernel: systemd[1]: 
systemd-journald.service: Failed with result 'timeout'.
  2025-05-18T03:01:44.164665+05:30 blr-r29-26u kernel: systemd[1]: Failed to 
start systemd-journald.service - Journal Service.
  2025-05-18T03:01:44.164667+05:30 blr-r29-26u kernel: systemd[1]: 
systemd-journald.service: Scheduled restart job, restart counter is at 2.
  2025-05-18T03:01:05.422331+05:30 blr-r29-26u systemd-networkd[1010564]: eno2: 
Link UP
  2025-05-18T03:01:44.164671+05:30 blr-r29-26u kernel: systemd[1]: Starting 
systemd-journald.service - Journal Service...
  2025-05-18T03:01:44.164672+05:30 blr-r29-26u kernel: 
systemd-journald[1010540]: Collecting audit messages is disabled.
  2025-05-18T03:01:44.164674+05:30 blr-r29-26u kernel: 
systemd-journald[1010540]: File 
/var/log/journal/7c9e984161af4278a958cf98d0e66061/system.journal corrupted or 
uncleanly shut down, renaming and replacing.
  2025-05-18T03:01:05.423549+05:30 blr-r29-26u systemd-networkd[1010564]: eno2: 
Gained carrier
  2025-05-18T03:01:44.164676+05:30 blr-r29-26u kernel: systemd[1]: 
systemd-networkd.service: State 'final-sigterm' timed out. Killing.
  2025-05-18T03:01:44.164684+05:30 blr-r29-26u kernel: systemd[1]: 
systemd-networkd.service: Killing process 1010382 (systemd-network) with signal 
SIGKILL.
  2025-05-18T03:01:05.426972+05:30 blr-r29-26u systemd-networkd[1010564]: eno3: 
Link UP
  2025-05-18T03:01:44.164685+05:30 blr-r29-26u kernel: systemd[1]: 
snapd.service: Processes still around after SIGKILL. Ignoring.
  2025-05-18T03:01:05.430531+05:30 blr-r29-26u systemd-networkd[1010564]: eno4: 
Link UP
  2025-05-18T03:01:44.164697+05:30 blr-r29-26u kernel: systemd[1]: Started 
systemd-journald.service - Journal Service.
  2025-05-18T03:01:44.164706+05:30 blr-r29-26u kernel: nvme nvme42: I/O tag 0 
(e000) type 4 opcode 0x18 (Admin Cmd) QID 0 timeout
  2025-05-18T03:01:44.164707+05:30 blr-r29-26u kernel: nvme nvme42: starting 
error recovery
  2025-05-18T03:01:05.432624+05:30 blr-r29-26u systemd-networkd[1010564]: 
enp130s0f0np0: Link UP
  2025-05-18T03:01:44.164714+05:30 blr-r29-26u kernel: nvme nvme42: failed 
nvme_keep_alive_end_io error=10
  2025-05-18T03:01:05.432740+05:30 blr-r29-26u systemd-networkd[1010564]: 
enp130s0f0np0: Gained carrier
  2025-05-18T03:01:44.164724+05:30 blr-r29-26u kernel: nvme nvme42: 
Reconnecting in 10 seconds...
  {noformat}

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2112304/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to