Am 06.03.2026 um 14:57 hat Peter Lieven geschrieben:
> 
> 
> > Am 06.03.2026 um 14:41 schrieb Hanna Czenczek <[email protected]>:
> > 
> > On 03.03.26 20:24, Peter Lieven wrote:
> >> 
> >> 
> >>> Am 10.11.2025 um 16:48 schrieb Hanna Czenczek <[email protected]> 
> >>> <mailto:[email protected]>:
> >>> 
> >>> Like in “rbd: Run co BH CB in the coroutine’s AioContext”, drop the
> >>> completion flag, yield exactly once, and run the BH in the coroutine’s
> >>> AioContext.
> >>> 
> >>> (Can be reproduced with multiqueue by adding a usleep(100000) before the
> >>> `while (!task.complete)` loops.)
> >>> 
> >>> Like in “iscsi: Run co BH CB in the coroutine’s AioContext”, this makes
> >>> nfs_co_generic_bh_cb() trivial, so we can drop it in favor of just
> >>> calling aio_co_wake() directly.
> >>> 
> >>> Cc: [email protected] <mailto:[email protected]>
> >>> Signed-off-by: Hanna Czenczek <[email protected]> 
> >>> <mailto:[email protected]>
> >>> ---
> >>> block/nfs.c | 41 ++++++++++++++++-------------------------
> >>> 1 file changed, 16 insertions(+), 25 deletions(-)
> >>> 
> >>> diff --git a/block/nfs.c b/block/nfs.c
> >>> index 0a7d38db09..1d3a34a30c 100644
> >>> --- a/block/nfs.c
> >>> +++ b/block/nfs.c
> >>> @@ -69,7 +69,6 @@ typedef struct NFSClient {
> >>> typedef struct NFSRPC {
> >>>     BlockDriverState *bs;
> >>>     int ret;
> >>> -    int complete;
> >>>     QEMUIOVector *iov;
> >>>     struct stat *st;
> >>>     Coroutine *co;
> >>> @@ -230,14 +229,6 @@ static void coroutine_fn 
> >>> nfs_co_init_task(BlockDriverState *bs, NFSRPC *task)
> >>>     };
> >>> }
> >>> 
> >>> -static void nfs_co_generic_bh_cb(void *opaque)
> >>> -{
> >>> -    NFSRPC *task = opaque;
> >>> -
> >>> -    task->complete = 1;
> >>> -    aio_co_wake(task->co);
> >>> -}
> >>> -
> >>> /* Called (via nfs_service) with QemuMutex held.  */
> >>> static void
> >>> nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data,
> >>> @@ -256,8 +247,16 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, 
> >>> void *data,
> >>>     if (task->ret < 0) {
> >>>         error_report("NFS Error: %s", nfs_get_error(nfs));
> >>>     }
> >>> -    replay_bh_schedule_oneshot_event(task->client->aio_context,
> >>> -                                     nfs_co_generic_bh_cb, task);
> >>> +
> >>> +    /*
> >>> +     * Safe to call: nfs_service(), which called us, is only run from 
> >>> the FD
> >>> +     * handlers, never from the request coroutine.  The request 
> >>> coroutine in
> >>> +     * turn will yield unconditionally.
> >>> +     * No need to release the lock, even if we directly enter the 
> >>> coroutine, as
> >>> +     * the lock is never re-taken after yielding.  (Note: If we do enter 
> >>> the
> >>> +     * coroutine, @task will probably be dangling once aio_co_wake() 
> >>> returns.)
> >>> +     */
> >>> +    aio_co_wake(task->co);
> >>> }
> >>> 
> >>> static int coroutine_fn nfs_co_preadv(BlockDriverState *bs, int64_t 
> >>> offset,
> >>> @@ -278,9 +277,7 @@ static int coroutine_fn 
> >>> nfs_co_preadv(BlockDriverState *bs, int64_t offset,
> >>> 
> >>>         nfs_set_events(client);
> >>>     }
> >>> -    while (!task.complete) {
> >>> -        qemu_coroutine_yield();
> >>> -    }
> >>> +    qemu_coroutine_yield();
> >>> 
> >>>     if (task.ret < 0) {
> >>>         return task.ret;
> >>> @@ -328,9 +325,7 @@ static int coroutine_fn 
> >>> nfs_co_pwritev(BlockDriverState *bs, int64_t offset,
> >>> 
> >>>         nfs_set_events(client);
> >>>     }
> >>> -    while (!task.complete) {
> >>> -        qemu_coroutine_yield();
> >>> -    }
> >>> +    qemu_coroutine_yield();
> >>> 
> >>>     if (my_buffer) {
> >>>         g_free(buf);
> >>> @@ -358,9 +353,7 @@ static int coroutine_fn nfs_co_flush(BlockDriverState 
> >>> *bs)
> >>> 
> >>>         nfs_set_events(client);
> >>>     }
> >>> -    while (!task.complete) {
> >>> -        qemu_coroutine_yield();
> >>> -    }
> >>> +    qemu_coroutine_yield();
> >>> 
> >>>     return task.ret;
> >>> }
> >>> @@ -723,8 +716,8 @@ nfs_get_allocated_file_size_cb(int ret, struct 
> >>> nfs_context *nfs, void *data,
> >>>     if (task->ret < 0) {
> >>>         error_report("NFS Error: %s", nfs_get_error(nfs));
> >>>     }
> >>> -    replay_bh_schedule_oneshot_event(task->client->aio_context,
> >>> -                                     nfs_co_generic_bh_cb, task);
> >>> +    /* Safe to call, see nfs_co_generic_cb() */
> >>> +    aio_co_wake(task->co);
> >>> }
> >>> 
> >>> static int64_t coroutine_fn 
> >>> nfs_co_get_allocated_file_size(BlockDriverState *bs)
> >>> @@ -748,9 +741,7 @@ static int64_t coroutine_fn 
> >>> nfs_co_get_allocated_file_size(BlockDriverState *bs)
> >>> 
> >>>         nfs_set_events(client);
> >>>     }
> >>> -    while (!task.complete) {
> >>> -        qemu_coroutine_yield();
> >>> -    }
> >>> +    qemu_coroutine_yield();
> >>> 
> >>>     return (task.ret < 0 ? task.ret : st.st_blocks * 512);
> >>> }
> >>> -- 
> >>> 2.51.1
> >>> 
> >> 
> >> Hallo Hanna,
> >> 
> >> it has been a long time where I was inactive in Qemu development and I 
> >> should have checked this earlier, but it seems that this patch 
> >> accidentally broke libnfs usage with at least qemu cmdline tools like 
> >> qemu-img. Again, sorry for not testing this earlier, but I have only 
> >> recently entered the Qemu show again.
> >> 
> >> I found this glitch while working on a patch to include libnfs v6 support 
> >> in qemu to avoid dropping libnfs support.
> >> 
> >> A simple call like:
> >> 
> >> qemu-img create -f qcow2 nfs://nfsserver/image.qcow2 10G
> >> 
> >> hangs forever in the second write request.
> > 
> > I sent a patch for this, which is basically the same as yours below: 
> > https://lists.nongnu.org/archive/html/qemu-block/2026-01/msg00000.html
> 
> Thanks, I will rebase by NFS patches for libnfs v6 on top of that.
> 
> Do we need the same fix for iSCSI?

iscsi drops the lock temporarily. This means that the deadlock won't
happen. What might be worth double checking is if temporarily dropping
the lock still gives enough guarantees for iscsi_service() and its
callers to work correctly.

I don't think I fully understand what the lock is protecting.
Apparently it comes from mass conversions from the original automatic
locking in AioContext callbacks, so nobody really seems to have thought
about it other than asserting that it's as safe as before. Maybe locking
more locally could be a better alternative that would be easier to
understand.

Kevin


Reply via email to