Also affects me. Some more information: After it gives that message, the
I/O for any ZFS device gets really horrible (NVMe SSDs get a maximum
throuput of 10MB/s while normally does 400MB/s) and only solves by
restarting.

Things I have tried based on what I saw over the internet for the same
issue:

1. Removing all swap
2. Reducing / Increasing the ARC RAM (always keeping at least 16GB of RAM free 
in the OS)
3. Checked smart data for all disks (everything is fine)

After rebooting, it _usually_ takes one or two days before the issue
happens again. The performance is normal while it doesn't happen. I have
several virtual machines that uses both SSD and disk drives. It usually
starts running some backup routines from 10 PM to midnight, so it might
be high I/O related stuff (although the backups usually only take a few
minutes to do).


My dmesg is a bit different though, I get this repeated about 4 ou 5 times with 
20 minutes interval, and the it usually stops reporting it.

[sex fev 24 02:31:53 2023] INFO: task txg_sync:2457 blocked for more than 120 
seconds.
[sex fev 24 02:31:53 2023]       Tainted: P           O      5.15.0-46-generic 
#49-Ubuntu
[sex fev 24 02:31:53 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[sex fev 24 02:31:53 2023] task:txg_sync        state:D stack:    0 pid: 2457 
ppid:     2 flags:0x00004000
[sex fev 24 02:31:53 2023] Call Trace:
[sex fev 24 02:31:53 2023]  <TASK>
[sex fev 24 02:31:53 2023]  __schedule+0x23d/0x590
[sex fev 24 02:31:53 2023]  schedule+0x4e/0xc0
[sex fev 24 02:31:53 2023]  schedule_timeout+0x87/0x140
[sex fev 24 02:31:53 2023]  ? zio_issue_async+0x12/0x20 [zfs]
[sex fev 24 02:31:53 2023]  ? __bpf_trace_tick_stop+0x20/0x20
[sex fev 24 02:31:53 2023]  io_schedule_timeout+0x51/0x80
[sex fev 24 02:31:53 2023]  __cv_timedwait_common+0x12c/0x170 [spl]
[sex fev 24 02:31:53 2023]  ? wait_woken+0x70/0x70
[sex fev 24 02:31:53 2023]  __cv_timedwait_io+0x19/0x20 [spl]
[sex fev 24 02:31:53 2023]  zio_wait+0x116/0x220 [zfs]
[sex fev 24 02:31:53 2023]  dsl_pool_sync+0xb6/0x400 [zfs]
[sex fev 24 02:31:53 2023]  ? __mod_timer+0x214/0x400
[sex fev 24 02:31:53 2023]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[sex fev 24 02:31:53 2023]  spa_sync+0x2dc/0x5b0 [zfs]
[sex fev 24 02:31:53 2023]  txg_sync_thread+0x266/0x2f0 [zfs]
[sex fev 24 02:31:53 2023]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[sex fev 24 02:31:53 2023]  thread_generic_wrapper+0x64/0x80 [spl]
[sex fev 24 02:31:53 2023]  ? __thread_exit+0x20/0x20 [spl]
[sex fev 24 02:31:53 2023]  kthread+0x12a/0x150
[sex fev 24 02:31:53 2023]  ? set_kthread_struct+0x50/0x50
[sex fev 24 02:31:53 2023]  ret_from_fork+0x22/0x30
[sex fev 24 02:31:53 2023]  </TASK>


Machine:
- Lenovo RD450
- Dual Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
- 192GB RAM DDR4
- 2x 4TB Disks WD RED (ZFS Mirror)
- 2x 2TB Crucial NVMe (ZFS Mirror)
- 2x 8TB Disks WD RED (ZFS Mirror)
- 512GB WD Green SSD (OS Only)
- QLCNIC 10Gbps NIC

OS:
- Distributor ID:       Ubuntu
- Description:  Ubuntu 22.04.1 LTS
- Release:      22.04
- Codename:     jammy

ZFS:

zfs-2.1.4-0ubuntu0.1
zfs-kmod-2.1.4-0ubuntu0.1

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1998870

Title:
  ZFS Pool hangs for more than 120 seconds under high I/O load

Status in zfs-linux package in Ubuntu:
  Confirmed

Bug description:
  I see the issue happening when:
    1.- deleting a FS --> sudo zfs destroy -r data/VMs
    2.- then renaming a new FS as the (old) deleted --> sudo zfs rename 
data/VMsNew data/VMs
    3.- removing data in other FS in the same zpool --> rm -rf * 
data/BackupsNew/*
    4.- Also I've seen this messages when running a rsync operation between two 
FS in the same pool.


  [Dec 5 13:42] INFO: task txg_sync:5468 blocked for more than 120 seconds.
  [  +0,000016]       Tainted: P           O      5.15.0-56-generic #62-Ubuntu
  [  +0,000006] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  +0,000004] task:txg_sync        state:D stack:    0 pid: 5468 ppid:     2 
flags:0x00004000
  [  +0,000013] Call Trace:
  [  +0,000005]  <TASK>
  [  +0,000007]  __schedule+0x24e/0x590
  [  +0,000016]  schedule+0x69/0x110
  [  +0,000009]  schedule_timeout+0x87/0x140
  [  +0,000011]  ? __bpf_trace_tick_stop+0x20/0x20
  [  +0,000011]  io_schedule_timeout+0x51/0x80
  [  +0,000011]  __cv_timedwait_common+0x12c/0x170 [spl]
  [  +0,000031]  ? wait_woken+0x70/0x70
  [  +0,000011]  __cv_timedwait_io+0x19/0x20 [spl]
  [  +0,000027]  zio_wait+0x116/0x220 [zfs]
  [  +0,000477]  ? spa_sync_aux_dev+0x280/0x280 [zfs]
  [  +0,000404]  spa_sync_frees+0x40/0x90 [zfs]
  [  +0,000396]  spa_sync_iterate_to_convergence+0x10d/0x1f0 [zfs]
  [  +0,000399]  spa_sync+0x2dc/0x5b0 [zfs]
  [  +0,000393]  txg_sync_thread+0x266/0x2f0 [zfs]
  [  +0,000418]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
  [  +0,000416]  thread_generic_wrapper+0x61/0x80 [spl]
  [  +0,000035]  ? __thread_exit+0x20/0x20 [spl]
  [  +0,000034]  kthread+0x127/0x150
  [  +0,000011]  ? set_kthread_struct+0x50/0x50
  [  +0,000012]  ret_from_fork+0x1f/0x30
  [  +0,000017]  </TASK>
  [Dec 5 13:44] INFO: task rsync:165214 blocked for more than 120 seconds.
  [  +0,000015]       Tainted: P           O      5.15.0-56-generic #62-Ubuntu
  [  +0,000006] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  +0,000004] task:rsync           state:D stack:    0 pid:165214 ppid:     1 
flags:0x00000006
  [  +0,000013] Call Trace:
  [  +0,000004]  <TASK>
  [  +0,000007]  __schedule+0x24e/0x590
  [  +0,000014]  ? kmem_cache_free+0x24f/0x290
  [  +0,000015]  schedule+0x69/0x110
  [  +0,000012]  cv_wait_common+0xf8/0x130 [spl]
  [  +0,000029]  ? wait_woken+0x70/0x70
  [  +0,000012]  __cv_wait+0x15/0x20 [spl]
  [  +0,000027]  dmu_tx_wait+0x8e/0x1e0 [zfs]
  [  +0,000347]  dmu_tx_assign+0x49/0x80 [zfs]
  [  +0,000322]  zfs_write+0x45c/0xdb0 [zfs]
  [  +0,000465]  ? unix_stream_recvmsg+0x8c/0xa0
  [  +0,000010]  ? generic_write_checks+0x65/0xc0
  [  +0,000011]  zpl_iter_write+0xe7/0x130 [zfs]
  [  +0,000443]  new_sync_write+0x111/0x1a0
  [  +0,000011]  vfs_write+0x1d5/0x270
  [  +0,000008]  ksys_write+0x67/0xf0
  [  +0,000008]  __x64_sys_write+0x19/0x20
  [  +0,000008]  do_syscall_64+0x59/0xc0
  [  +0,000009]  ? syscall_exit_to_user_mode+0x27/0x50
  [  +0,000010]  ? __x64_sys_read+0x19/0x20
  [  +0,000007]  ? do_syscall_64+0x69/0xc0
  [  +0,000007]  ? do_syscall_64+0x69/0xc0
  [  +0,000006]  ? do_syscall_64+0x69/0xc0
  [  +0,000008]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
  [  +0,000012] RIP: 0033:0x7f6b39c5fa37
  [  +0,000008] RSP: 002b:00007fff8fe554f8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
  [  +0,000009] RAX: ffffffffffffffda RBX: 00007f6b390f2010 RCX: 
00007f6b39c5fa37
  [  +0,000006] RDX: 0000000000040000 RSI: 00007f6b390f2010 RDI: 
0000000000000001
  [  +0,000005] RBP: 0000000000000001 R08: 00007f6b3912a010 R09: 
0000000000000010
  [  +0,000005] R10: 000000000000000f R11: 0000000000000246 R12: 
0000000000040000
  [  +0,000005] R13: 0000000000008000 R14: 0000000000008000 R15: 
0000557bb34da730
  [  +0,000010]  </TASK>
  [Dec 6 01:45] INFO: task txg_sync:5468 blocked for more than 120 seconds.
  [  +0,000016]       Tainted: P           O      5.15.0-56-generic #62-Ubuntu
  [  +0,000006] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  +0,000004] task:txg_sync        state:D stack:    0 pid: 5468 ppid:     2 
flags:0x00004000
  [  +0,000013] Call Trace:
  [  +0,000004]  <TASK>
  [  +0,000007]  __schedule+0x24e/0x590
  [  +0,000016]  schedule+0x69/0x110
  [  +0,000009]  schedule_timeout+0x87/0x140
  [  +0,000011]  ? __bpf_trace_tick_stop+0x20/0x20
  [  +0,000011]  io_schedule_timeout+0x51/0x80
  [  +0,000012]  __cv_timedwait_common+0x12c/0x170 [spl]
  [  +0,000030]  ? wait_woken+0x70/0x70
  [  +0,000012]  __cv_timedwait_io+0x19/0x20 [spl]
  [  +0,000027]  zio_wait+0x116/0x220 [zfs]
  [  +0,000477]  ? spa_sync_aux_dev+0x280/0x280 [zfs]
  [  +0,000405]  spa_sync_frees+0x40/0x90 [zfs]
  [  +0,000399]  spa_sync_iterate_to_convergence+0x10d/0x1f0 [zfs]
  [  +0,000397]  spa_sync+0x2dc/0x5b0 [zfs]
  [  +0,000393]  txg_sync_thread+0x266/0x2f0 [zfs]
  [  +0,000417]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
  [  +0,000417]  thread_generic_wrapper+0x61/0x80 [spl]
  [  +0,000036]  ? __thread_exit+0x20/0x20 [spl]
  [  +0,000032]  kthread+0x127/0x150
  [  +0,000012]  ? set_kthread_struct+0x50/0x50
  [  +0,000011]  ret_from_fork+0x1f/0x30
  [  +0,000017]  </TASK>
  [Dec 6 01:47] INFO: task txg_sync:5468 blocked for more than 241 seconds.
  [  +0,000015]       Tainted: P           O      5.15.0-56-generic #62-Ubuntu
  [  +0,000006] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  +0,000005] task:txg_sync        state:D stack:    0 pid: 5468 ppid:     2 
flags:0x00004000
  [  +0,000012] Call Trace:
  [  +0,000007]  <TASK>
  [  +0,000007]  __schedule+0x24e/0x590
  [  +0,000017]  schedule+0x69/0x110
  [  +0,000008]  schedule_timeout+0x87/0x140
  [  +0,000012]  ? __bpf_trace_tick_stop+0x20/0x20
  [  +0,000010]  io_schedule_timeout+0x51/0x80
  [  +0,000012]  __cv_timedwait_common+0x12c/0x170 [spl]
  [  +0,000031]  ? wait_woken+0x70/0x70
  [  +0,000011]  __cv_timedwait_io+0x19/0x20 [spl]
  [  +0,000028]  zio_wait+0x116/0x220 [zfs]
  [  +0,000508]  ? spa_sync_aux_dev+0x280/0x280 [zfs]
  [  +0,000419]  spa_sync_frees+0x40/0x90 [zfs]
  [  +0,000398]  spa_sync_iterate_to_convergence+0x10d/0x1f0 [zfs]
  [  +0,000398]  spa_sync+0x2dc/0x5b0 [zfs]
  [  +0,000391]  txg_sync_thread+0x266/0x2f0 [zfs]
  [  +0,000418]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
  [  +0,000417]  thread_generic_wrapper+0x61/0x80 [spl]
  [  +0,000036]  ? __thread_exit+0x20/0x20 [spl]
  [  +0,000033]  kthread+0x127/0x150
  [  +0,000012]  ? set_kthread_struct+0x50/0x50
  [  +0,000011]  ret_from_fork+0x1f/0x30
  [  +0,000017]  </TASK>

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1998870/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to