[Original Description]

One LXC user reported lots of processes stuck in D state:
several threads waiting in the memory shrinker semaphore
(this symptom was thought to be/fixed via LP bug 1817628).

After some time, a provided crashdump revealed the issue
in ZFS's evict node path running in memory shrinker path
(thus locking the semaphore as observed previously/above).

The stack trace shows the inode memory shrinker entered
ZFS and is looping in zfs_zget().

PID: 42105  TASK: ffff881169f3d400  CPU: 36  COMMAND: "lxcfs"
 #0 [ffff88103ea88e38] crash_nmi_callback at ffffffff810518a7
 #1 [ffff88103ea88e48] nmi_handle at ffffffff810323ae
 #2 [ffff88103ea88ea0] default_do_nmi at ffffffff810328f4
 #3 [ffff88103ea88ec0] do_nmi at ffffffff81032aa2
 #4 [ffff88103ea88ee8] end_repeat_nmi at ffffffff8185a587
    [exception RIP: _raw_spin_lock+20]
    RIP: ffffffff81857464  RSP: ffff881a23bab138  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: ffff8810a11afb78  RCX: ffff881e7ad76858
    RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffff8810a11afb78
    RBP: ffff881a23bab138   R8: 000000000001a6a0   R9: ffffffffc05e384a
    R10: ffffea0070071400  R11: ffff88014e96d340  R12: 0000000000000000
    R13: ffff8810a11afb50  R14: ffff88014e96d340  R15: ffff8810a11afaf8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #5 [ffff881a23bab138] _raw_spin_lock at ffffffff81857464
 #6 [ffff881a23bab140] dbuf_read at ffffffffc08c141a [zfs]
 #7 [ffff881a23bab1e8] dnode_hold_impl at ffffffffc08db218 [zfs]
 #8 [ffff881a23bab250] dnode_hold at ffffffffc08db659 [zfs]
 #9 [ffff881a23bab260] dmu_bonus_hold at ffffffffc08ca2b6 [zfs]
#10 [ffff881a23bab2a0] sa_buf_hold at ffffffffc09023fe [zfs]
#11 [ffff881a23bab2b0] zfs_zget at ffffffffc095cb47 [zfs]
#12 [ffff881a23bab350] zfs_purgedir at ffffffffc093be54 [zfs]
#13 [ffff881a23bab558] zfs_rmnode at ffffffffc093c212 [zfs]
#14 [ffff881a23bab5a0] zfs_zinactive at ffffffffc095d2f8 [zfs]
#15 [ffff881a23bab5d8] zfs_inactive at ffffffffc0956671 [zfs]
#16 [ffff881a23bab628] zpl_evict_inode at ffffffffc096dc03 [zfs]
#17 [ffff881a23bab650] evict at ffffffff81233d81
#18 [ffff881a23bab678] dispose_list at ffffffff81233e86
#19 [ffff881a23bab690] prune_icache_sb at ffffffff81234fea
#20 [ffff881a23bab6c8] super_cache_scan at ffffffff8121b862
#21 [ffff881a23bab720] shrink_slab at ffffffff811a8e0d
#22 [ffff881a23bab800] shrink_zone at ffffffff811ad488
#23 [ffff881a23bab880] do_try_to_free_pages at ffffffff811ad5fb
#24 [ffff881a23bab900] try_to_free_pages at ffffffff811ad91e
#25 [ffff881a23bab980] __alloc_pages_slowpath.constprop.88 at ffffffff8119ee92
#26 [ffff881a23baba60] __alloc_pages_nodemask at ffffffff8119f908
#27 [ffff881a23babb00] alloc_pages_current at ffffffff811ea47c
#28 [ffff881a23babb48] alloc_kmem_pages at ffffffff8119d4d9
#29 [ffff881a23babb70] kmalloc_order_trace at ffffffff811bb04e
#30 [ffff881a23babbb0] __kmalloc at ffffffff811f6e90
#31 [ffff881a23babbf8] seq_buf_alloc at ffffffff8123ca00
#32 [ffff881a23babc10] single_open_size at ffffffff8123dc1a
#33 [ffff881a23babc50] stat_open at ffffffff8128fc76
#34 [ffff881a23babc68] proc_reg_open at ffffffff81286011
#35 [ffff881a23babca0] do_dentry_open at ffffffff81215a02
#36 [ffff881a23babce0] vfs_open at ffffffff81216b94
#37 [ffff881a23babd08] path_openat at ffffffff81226bac
#38 [ffff881a23babdc8] do_filp_open at ffffffff81228b41
#39 [ffff881a23babed0] do_sys_open at ffffffff81216f68
#40 [ffff881a23babf40] sys_open at ffffffff812170ee
#41 [ffff881a23babf50] entry_SYSCALL_64_fastpath at ffffffff818576ce

This stack trace is closely matched in this upstream
ZFS github issue (#4816) with patches merged in this
ZFS github pull request (#4827) [see LP comments for
details.

Other 2 threads were found in ZFS code, but not checked
to be related to the above thread as an upstream Github
issue closely matched & described the stack trace above.

PID: 56179  TASK: ffff880106189c00  CPU: 3   COMMAND: "crond"
 #0 [ffff88203de48e38] crash_nmi_callback at ffffffff810518a7
 #1 [ffff88203de48e48] nmi_handle at ffffffff810323ae
 #2 [ffff88203de48ea0] default_do_nmi at ffffffff810328f4
 #3 [ffff88203de48ec0] do_nmi at ffffffff81032aa2
 #4 [ffff88203de48ee8] end_repeat_nmi at ffffffff8185a587
    [exception RIP: _raw_spin_lock+20]
    RIP: ffffffff81857464  RSP: ffff880002dc7978  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: ffff8812ee29f168  RCX: 0000000000000001
    RDX: 0000000000000001  RSI: 0000000000000202  RDI: ffff8812ee29f1f0
    RBP: ffff880002dc7978   R8: ffff880115e22c00   R9: 0000008000000000
    R10: ffffea004758ac00  R11: ffffffffffffc000  R12: ffff8812ee29f1f0
    R13: 0000000000000000  R14: ffff8812ee29ef50  R15: ffff880eefad8060
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #5 [ffff880002dc7978] _raw_spin_lock at ffffffff81857464
 #6 [ffff880002dc7980] igrab at ffffffff81233a4e
 #7 [ffff880002dc79a0] zfs_zget at ffffffffc095cad6 [zfs]
 #8 [ffff880002dc7a40] zfs_dirent_lock at ffffffffc093ae99 [zfs]
 #9 [ffff880002dc7ae8] zfs_dirlook at ffffffffc093afd0 [zfs]
#10 [ffff880002dc7b50] zfs_lookup at ffffffffc0950716 [zfs]
#11 [ffff880002dc7bc0] zpl_lookup at ffffffffc096d147 [zfs]
#12 [ffff880002dc7c50] lookup_real at ffffffff81221c83
#13 [ffff880002dc7c70] __lookup_hash at ffffffff812235a2
#14 [ffff880002dc7ca0] walk_component at ffffffff8122457c
#15 [ffff880002dc7d00] path_lookupat at ffffffff812262cd
#16 [ffff880002dc7d28] filename_lookup at ffffffff81227f21
#17 [ffff880002dc7e40] user_path_at_empty at ffffffff812280c6
#18 [ffff880002dc7e68] vfs_fstatat at ffffffff8121cef6
#19 [ffff880002dc7eb8] SYSC_newstat at ffffffff8121d44e
#20 [ffff880002dc7f40] sys_newstat at ffffffff8121d5de
#21 [ffff880002dc7f50] entry_SYSCALL_64_fastpath at ffffffff818576ce
    RIP: 00007f297d605895  RSP: 00007ffeb517f2e8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00000000000000a0  RCX: 00007f297d605895
    RDX: 00007ffeb517f300  RSI: 00007ffeb517f300  RDI: 00007f297dd1b8b0
    RBP: 00007f297d8dd760   R8: 0000000000000040   R9: 0000000000000060
    R10: 00007ffeb517ed60  R11: 0000000000000246  R12: 000055847274a6b0
    R13: 000055847274a5f0  R14: 0000000000000020  R15: 00007f297d8dd7b8
    ORIG_RAX: 0000000000000004  CS: 0033  SS: 002b

PID: 62590  TASK: ffff881c96c31c00  CPU: 6   COMMAND: "in:imjournal"
 #0 [ffff88103e6c8e38] crash_nmi_callback at ffffffff810518a7
 #1 [ffff88103e6c8e48] nmi_handle at ffffffff810323ae
 #2 [ffff88103e6c8ea0] default_do_nmi at ffffffff810328f4
 #3 [ffff88103e6c8ec0] do_nmi at ffffffff81032aa2
 #4 [ffff88103e6c8ee8] end_repeat_nmi at ffffffff8185a587
    [exception RIP: zrl_add+34]
    RIP: ffffffffc09704d2  RSP: ffff880999453860  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 000000000000754f  RCX: ffff880b9521a9d8
    RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffff880b9521a9a0
    RBP: ffff880999453870   R8: 0000000000000000   R9: 000000000000000e
    R10: ffffea0024d0f600  R11: ffffffffffffc000  R12: 00000000000008e8
    R13: ffff881e7ad76800  R14: 0000000000000001  R15: ffff8807b7f5ca68
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #5 [ffff880999453860] zrl_add at ffffffffc09704d2 [zfs]
 #6 [ffff880999453878] dnode_hold_impl at ffffffffc08db279 [zfs]
 #7 [ffff8809994538e0] dnode_hold at ffffffffc08db659 [zfs]
 #8 [ffff8809994538f0] dmu_bonus_hold at ffffffffc08ca2b6 [zfs]
 #9 [ffff880999453930] sa_buf_hold at ffffffffc09023fe [zfs]
#10 [ffff880999453940] zfs_zget at ffffffffc095cb47 [zfs]
#11 [ffff8809994539e0] zfs_dirent_lock at ffffffffc093ae99 [zfs]
#12 [ffff880999453a88] zfs_dirlook at ffffffffc093afd0 [zfs]
#13 [ffff880999453af0] zfs_lookup at ffffffffc0950716 [zfs]
#14 [ffff880999453b60] zpl_lookup at ffffffffc096d147 [zfs]
#15 [ffff880999453bf0] lookup_real at ffffffff81221c83
#16 [ffff880999453c10] __lookup_hash at ffffffff812235a2
#17 [ffff880999453c40] walk_component at ffffffff8122457c
#18 [ffff880999453ca0] link_path_walk at ffffffff81225dc1
#19 [ffff880999453d08] path_openat at ffffffff812266b9
#20 [ffff880999453dc8] do_filp_open at ffffffff81228b41
#21 [ffff880999453ed0] do_sys_open at ffffffff81216f68
#22 [ffff880999453f40] sys_open at ffffffff812170ee
#23 [ffff880999453f50] entry_SYSCALL_64_fastpath at ffffffff818576ce
    RIP: 00007f9b04897d4d  RSP: 00007f9b01260b40  RFLAGS: 00000293
    RAX: ffffffffffffffda  RBX: 00007f9af4000020  RCX: 00007f9b04897d4d
    RDX: 00000000000001b6  RSI: 0000000000000241  RDI: 00007f9b01260be0
    RBP: 00007f9af4000020   R8: 00007f9b03b64a7c   R9: 0000000000000240
    R10: 0000000000000024  R11: 0000000000000293  R12: 00007f9b01260d90
    R13: 00007f9af40cc480  R14: 0000000000000050  R15: 0000000000000007
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1839521

Title:
  Xenial: ZFS deadlock in shrinker path with xattrs

Status in zfs-linux package in Ubuntu:
  Invalid
Status in zfs-linux source package in Xenial:
  In Progress
Status in zfs-linux source package in Bionic:
  Invalid
Status in zfs-linux source package in Disco:
  Invalid
Status in zfs-linux source package in Eoan:
  Invalid

Bug description:
  [Impact]

   * Xenial's ZFS can deadlock in the memory shrinker path
     after removing files with extended attributes (xattr).

   * Extended attributes are enabled by default, but are
     _not_ used by default, which reduces the likelyhood.

   * It's very difficult/rare to reproduce this problem,
     due to file/xattr/remove/shrinker/lru order/timing
     circumstances required. (weeks for a reporter user)
     but a synthetic test-case has been found for tests.

  [Test Case]

   * A synthetic reproducer is available for this LP,
     with a few steps to touch/setfattr/rm/drop_caches
     plus a kernel module to massage the disposal list.

   * In the original ZFS module:
     the xattr dir inode is not purged immediately on
     file removal, but possibly purged _two_ shrinker
     invocations later.  This allows for other thread
     started before file remove to call zfs_zget() on
     the xattr child inode and iput() it, so it makes
     to the same disposal list as the xattr dir inode.

   * In the modified ZFS module:
     the xattr dir inode is purged immediately on file
     removal not possibly later on shrinker invocation,
     so the problem window above doesn't exist anymore.

  [Regression Potential]

   * Low. The patches are confined to extended attributes
     in ZFS, specifically node removal/purge, and another
     change how an xattr child inode tracks its xattr dir
     (parent) inode, so that it can be purged immediately
     on removal.

   * The ZFS test-suite has been run on original/modified
     zfs-dkms package/kernel modules, with no regressions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1839521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to