So it may be that the write-back-throttling (wbt) for the underlying
devices is getting confused about the exact throttle rates are for these
devices and somehow getting stuck. It maybe worth experimenting by
disabling the throttling and seeing if this gets I/O working again.

For example, to disable wbt for a device /dev/sda use:

echo 0 | sudo tee /sys/block/sda/queue/wbt_lat_usec

and if you need to reset it back to the default:

echo -1 | sudo tee /sys/block/sda/queue/wbt_lat_usec

..use the appropriate block device name for the block devices you have
attached. It may even be worth setting the wbt_lat_usec to 0 for all the
block devices in your pool as early as possible after boot and see if
this helps.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1889110

Title:
  zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than
  120 seconds. "

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  ZFS filesystem becomes unresponsive and subsequent NFS shares
  unresponsive. ESXi sees all paths down.

  See this error 3 times in a row.

  
  [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds.   
                                                                                
                                                  
  [184383.479565]       Tainted: P          IO      5.4.0-42-generic #46-Ubuntu 
                                                                                
                                                  
  [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.                                                                   
                                                    
  [184383.479655] txg_sync        D    0  4307      2 0x80004000                
                                                                                
                                                  
  [184383.479658] Call Trace:                                                   
                                                                                
                                                  
  [184383.479670]  __schedule+0x2e3/0x740                                       
                                                                                
                                                  
  [184383.479673]  schedule+0x42/0xb0                                           
                                                                                
                                                  
  [184383.479676]  schedule_timeout+0x152/0x2f0                                 
                                                                                
                                                  
  [184383.479683]  ? __next_timer_interrupt+0xe0/0xe0                           
                                                                                
                                                  
  [184383.479685]  io_schedule_timeout+0x1e/0x50                                
                                                                                
                                                  
  [184383.479697]  __cv_timedwait_common+0x15e/0x1c0 [spl]                      
                                                                                
                                                  
  [184383.479702]  ? wait_woken+0x80/0x80                                       
                                                                                
                                                  
  [184383.479710]  __cv_timedwait_io+0x19/0x20 [spl]                            
                                                                                
                                                  
  [184383.479816]  zio_wait+0x11b/0x230 [zfs]                                   
                                                                                
                                                  
  [184383.479905]  ? __raw_spin_unlock+0x9/0x10 [zfs]                           
                                                                                
                                                  
  [184383.479983]  dsl_pool_sync+0xbc/0x410 [zfs]                               
                                                                                
                                                  
  [184383.480069]  spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs]             
                                                                                
                                                  
  [184383.480156]  spa_sync+0x312/0x5b0 [zfs]                                   
                                                                                
                                                  
  [184383.480245]  txg_sync_thread+0x27a/0x310 [zfs]                            
                                                                                
                                                  
  [184383.480334]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]                   
                                                                                
                                                  
  [184383.480344]  thread_generic_wrapper+0x83/0xa0 [spl]                       
                                                                                
                                                  
  [184383.480347]  kthread+0x104/0x140                                          
                                                                                
                                                  
  [184383.480356]  ? clear_bit+0x20/0x20 [spl]                                  
                                                                                
                                                  
  [184383.480358]  ? kthread_park+0x90/0x90                                     
                                                                                
                                                  
  [184383.480361]  ret_from_fork+0x35/0x40                                      


  Then nfsd hangs as well.

  
  [184866.787445] INFO: task nfsd:6585 blocked for more than 120 seconds.
  [184866.787485]       Tainted: P          IO      5.4.0-42-generic #46-Ubuntu
  [184866.787526] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [184866.787573] nfsd            D    0  6585      2 0x80004000
  [184866.787575] Call Trace:
  [184866.787578]  __schedule+0x2e3/0x740
  [184866.787675]  ? __raw_spin_unlock+0x9/0x10 [zfs]
  [184866.787678]  schedule+0x42/0xb0
  [184866.787685]  cv_wait_common+0x133/0x180 [spl]
  [184866.787688]  ? wait_woken+0x80/0x80
  [184866.787695]  __cv_wait+0x15/0x20 [spl]
  [184866.787764]  dmu_tx_wait+0x1ee/0x210 [zfs]
  [184866.787834]  dmu_tx_assign+0x49/0x70 [zfs]
  [184866.787929]  zfs_write+0x461/0xd40 [zfs]
  [184866.788025]  ? atomic_sub_return.constprop.0+0xd/0x20 [zfs]
  [184866.788033]  ? atomic_dec+0xd/0x20 [spl]
  [184866.788116]  ? __raw_spin_unlock+0x9/0x10 [zfs]
  [184866.788122]  ? __d_obtain_alias+0x36/0x90
  [184866.788217]  zpl_write_common_iovec+0xad/0x120 [zfs]
  [184866.788313]  zpl_iter_write_common+0x8e/0xb0 [zfs]
  [184866.788409]  zpl_iter_write+0x56/0x90 [zfs]
  [184866.788413]  do_iter_readv_writev+0x14f/0x1d0
  [184866.788416]  do_iter_write+0x84/0x1a0
  [184866.788418]  vfs_iter_write+0x19/0x30
  [184866.788442]  nfsd_vfs_write+0xe0/0x480 [nfsd]
  [184866.788454]  nfsd_write+0x7a/0x160 [nfsd]
  [184866.788458]  ? kmem_cache_alloc+0x16d/0x230
  [184866.788472]  nfsd3_proc_write+0xc3/0x170 [nfsd]
  [184866.788483]  nfsd_dispatch+0xd6/0x220 [nfsd]
  [184866.788508]  svc_process_common+0x3af/0x700 [sunrpc]
  [184866.788527]  ? svc_sock_secure_port+0x16/0x30 [sunrpc]
  [184866.788538]  ? nfsd_svc+0x2d0/0x2d0 [nfsd]
  [184866.788557]  svc_process+0xd9/0x110 [sunrpc]
  [184866.788568]  nfsd+0xe8/0x150 [nfsd]
  [184866.788570]  kthread+0x104/0x140
  [184866.788581]  ? nfsd_destroy+0x60/0x60 [nfsd]
  [184866.788583]  ? kthread_park+0x90/0x90
  [184866.788585]  ret_from_fork+0x35/0x40


  Linux zfs-01 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC
  2020 x86_64 x86_64 x86_64 GNU/Linux

  root@zfs-01:/# lsb_release -rd
  Description:    Ubuntu 20.04 LTS
  Release:        20.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1889110/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to