Hi Brian,
sorry for my late reply, just back from vacation and fighting through
my mails.
On 8/4/25 01:33, Brian Song wrote:
>
>
> On 2025-08-01 12:09 p.m., Brian Song wrote:
>> Hi Bernd,
>>
>> We are currently working on implementing termination support for fuse-
>> over-io_uring in QEMU, and right now we are focusing on how to clean up
>> in-flight SQEs properly. Our main question is about how well the kernel
>> supports robust cancellation for these fuse-over-io_uring SQEs. Does it
>> actually implement cancellation beyond destroying the io_uring queue?
>>
>> In QEMU FUSE export, we need a way to quickly and cleanly detach from
>> the event loop and cancel any pending SQEs when an export is no longer
>> in use. Ideally, we want to avoid the more drastic measure of having to
>> close the entire /dev/fuse fd just to gracefully terminate outstanding
>> operations.
>>
>> We are not sure if there's an existing code path that supports async
>> cancel for these in-flight SQEs in the fuse-over-io_uring setup, or if
>> additional callbacks might be needed to fully integrate with the
>> kernel's async cancel mechanism. We also realized libfuse manages
>> shutdowns differently, typically by signaling a thread via eventfd
>> rather than relying on async cancel.
>>
>> Would love to hear your thoughts or suggestions on this!
>>
>> Thanks,
>> Brian
>
> I looked into the kernel codebase and came up with some initial ideas,
> which might not be entirely accurate:
>
> The IORING_OP_ASYNC_CANCEL operation can only cancel io_uring ring
> resources and a limited set of request types. It does not clean up
> resources related to fuse-over-io_uring, such as in-use entries.
> IORING_OP_ASYNC_CANCEL
> -> submit/enter
> -> io_uring/opdef.c:: .issue = io_async_cancel,
> -> __io_async_cancel
> -> io_try_cancel ==> Can only cancel few types of requests
>
>
> Currently, full cleanup of both io_uring and FUSE data structures for
> fuse-over-io_uring only happens in two cases: [since we have mark these
> SQEs cancelable when we commit_and_fetch everytime(mentioned below)]
> 1.When the FUSE daemon exits (exit syscall)
> 2.During execve, which triggers the kernel path:
>
> io_uring_files_cancel =>
> io_uring_try_cancel_uring_cmd =>
> file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL | IO_URING_F_COMPLETE_DEFER)
>
>
>
> Below is a state diagram (mermaid graph) of a fuse_uring entry inside
> the kernel:
>
> graph TD
> A["Userspace daemon"] -->
> B["FUSE_IO_URING_CMD_REGISTER<br/>Register buffer"]
> B --> C["Create fuse_ring_ent"]
> C --> D["State: FRRS_AVAILABLE<br/>Added to ent_avail_queue"]
>
> E["FUSE filesystem operation"] --> F["Generate FUSE request"]
> F --> G["fuse_uring_queue_fuse_req()"]
> G --> H{"Check ent_avail_queue"}
>
> H -->|Entry available| I["Take entry from queue<br/>Assign to FUSE
> request"]
> H -->|No entry available| J["Request goes to fuse_req_queue and waits"]
>
> I --> K["fuse_uring_dispatch_ent()"]
> K --> L["State: FRRS_USERSPACE<br/>Move to ent_in_userspace"]
> L --> M["Notify userspace to process"]
>
> N["Process exit / daemon termination"] -->
> O["io_uring_try_cancel_uring_cmd() <br/> >> NOTE Since we marked the
> entry IORING_URING_CMD_CANCELABLE <br/> in the previous fuse_uring_cmd ,
> try_cancel_uring_cmd will call <br/> fuse_uring_cmd to 'delete' it <<"]
> O --> P["fuse_uring_cancel()"]
> P --> Q{"Is entry state AVAILABLE?"}
>
> Q -->|Yes| R[">> equivalent to 'delete' << Directly change to
> USERSPACE<br/>Move to ent_in_userspace"]
> Q -->|No| S["Do nothing"]
>
> R --> T["io_uring_cmd_done(-ENOTCONN)"]
> T --> U["Entry is 'disguised' as completed<br/>Will no longer
> handle new FUSE requests"]
>
> V["Practical effects of cancellation:"] --> W["1. Prevent new FUSE
> requests from using this entry<br/>2. Release io_uring command
> resources<br/>3. Does not affect already assigned FUSE requests"]
>
>
>
> When the kernel is waiting for VFS requests and the corresponding entry
> is idle, its state is FRRS_AVAILABLE. Once a request is handed off to
> the userspace daemon, the entry's state transitions to FRRS_USERSPACE.
>
> The fuse_uring_cmd function handles the COMMIT_AND_FETCH operation. If a
> cmd call carries the IO_URING_F_CANCEL flag, fuse_uring_cancel is
> invoked to mark the entry state as FRRS_USERSPACE, making it unavailable
> for future requests from the VFS.
>
> If the IORING_URING_CMD_CANCELABLE flag is not set, before committing
> and fetching, we first call fuse_uring_prepare_cancel to mark the entry
> as IORING_URING_CMD_CANCELABLE. This indicates that if the daemon exits
> or an execve happens during fetch, the kernel can call
> io_uring_try_cancel_uring_cmd to safely clean up these SQEs/CQEs and
> related fuse resource.
>
> Back to our previous issue, when deleting a FUSE export in QEMU, we hit
> a crash due to an invalid CQE handler. This happened because the SQEs we
> previously submitted hadn't returned yet by the time we shut down and
> deleted the export.
>
> To avoid this, we need to ensure that no further CQEs are returned and
> no CQE handler is triggered. We need to either:
>
> * Prevent any further user operations before calling blk_exp_close_all
>
> or
>
> * Require the userspace to trigger few specific operations that causes
> the kernel to return all outstanding CQEs, and then the daemon can send
> io_uring_cmd with the IO_URING_F_CANCEL flag to mark all entries as
> unavailable (FRRS_USERSPACE) "delete operation", ensuring the kernel
> won't assign them to future VFS requests.
>
>
>
I have to admit that I'm confused why you can't use umount, isn't that
the most graceful way to shutdown a connection?
If you need another custom way for some reasons, we probably need
to add it.
Thanks,
Bernd