[Kernel-packages] [Bug 2089410] Re: NFS: fix deadlock with pNFS flexfiles IO retry error path

Mike Snitzer Mon, 16 Dec 2024 10:05:50 -0800

This has now been release in 5.15.174:

commit 31545f4b7cdb6da6a0519120b8c96dc40f186aac
Author: Trond Myklebust <trond.mykleb...@hammerspace.com>
Date:   Mon Aug 1 14:16:51 2022 -0400


    NFS: nfs_async_write_reschedule_io must not recurse into the
writeback code

    commit b1a28f2eb9ea7a5a1763fe53fe699aa0feae4231 upstream.

    It is not safe to call filemap_fdatawrite_range() from
    nfs_async_write_reschedule_io(), since we're often calling from a page
    reclaim context. Just let fsync() redrive the writeback for us.

    Signed-off-by: Trond Myklebust <trond.mykleb...@hammerspace.com>
    Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2089410

Title:
  NFS: fix deadlock with pNFS flexfiles IO retry error path

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Triaged

Bug description:
  SRU Justification:

  Impact: In production at a mutual "hyperscaler" customer that is using
  the Ubuntu jammy kernel's NFS client with Hammerspace's pNFS
  flexfiles: NFS client deadlock occurred due to upstream commit
  7be7b3ca16a59 ("NFS: Ensure we immediately start writeback on
  rescheduled writes").  Which was later fixed with upstream commit
  b1a28f2eb9ea7 ("NFS: nfs_async_write_reschedule_io must not recurse
  into the writeback code") in August 2022.  But it unfortunately wasn't
  marked for stable@ at that time.  That has since been rectified and
  Greg  Kroah-Hartman has now picked it up for the next
  stable/linux-5.15.y kernel (but it hasn't yet appeared in the stable
  repo yet), please see:
  https://lore.kernel.org/stable/2024112146-tiptoeing-
  available-c5fe@gregkh/T/

  Fix: Apply upstream commit b1a28f2eb9ea7 ("NFS:
  nfs_async_write_reschedule_io must not recurse into the writeback
  code"), that commit was developed by and came from Trond Myklebust the
  upstream Linux NFS client maintainer.

  Testcase: Cause buffered IO issued by NFS client using pNFS flexfiles to hit 
error paths (due to heavy enterprise use, with container limits being imposed, 
which makes OOM within container particularly prone to hit error memory 
allocation errors _and_ additional reason for NFS IO to be retransmitted, e.g. 
due to volume down/up bounces). This can lead to deadlock in NFS due to 
recursion with page locks already held, e.g.:
  [<0>] wait_on_page_bit_common+0x10c/0x3d0
  [<0>] wait_on_page_bit+0x3f/0x50
  [<0>] wait_on_page_writeback+0x26/0x80
  [<0>] write_cache_pages+0x138/0x460
  [<0>] nfs_writepages+0x10d/0x200 [nfs]
  [<0>] do_writepages+0xd4/0x200
  [<0>] filemap_fdatawrite_wbc+0x89/0xe0
  [<0>] filemap_fdatawrite_range+0x54/0x70
  [<0>] nfs_async_write_reschedule_io+0x69/0x80 [nfs]
  [<0>] ff_layout_reset_write+0x73/0xe0 [nfs_layout_flexfiles]
  [<0>] ff_layout_write_release+0x7a/0x90 [nfs_layout_flexfiles]
  [<0>] rpc_free_task+0x3d/0x70 [sunrpc]
  [<0>] rpc_async_release+0x30/0x50 [sunrpc]
  [<0>] process_one_work+0x228/0x3d0
  [<0>] worker_thread+0x53/0x420
  [<0>] kthread+0x127/0x150
  [<0>] ret_from_fork+0x1f/0x30

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2089410/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2089410] Re: NFS: fix deadlock with pNFS flexfiles IO retry error path

Reply via email to