On Tue, Jul 01, 2025 at 01:44:34PM +0200, Hanna Czenczek wrote:
> FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs
> (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)).
>
> We can use this to implement multi-threading.
>
> For configuration, we don't need any more information beyond the simple
> array provided by the core block export interface: The FUSE kernel
> driver feeds these FDs in a round-robin fashion, so all of them are
> equivalent and we want to have exactly one per thread.
>
> These are the benchmark results when using four threads (compared to a
> single thread); note that fio still only uses a single job, but
> performance can still be improved because of said round-robin usage for
> the queues. (Not in the sync case, though, in which case I guess it
> just adds overhead.)
>
> file:
> read:
> seq aio: 264.8k ±0.8k (+120 %)
> rand aio: 143.8k ±0.4k (+ 27 %)
> seq sync: 49.9k ±0.5k (- 5 %)
> rand sync: 10.3k ±0.1k (- 1 %)
> write:
> seq aio: 226.6k ±2.1k (+184 %)
> rand aio: 225.9k ±1.8k (+186 %)
> seq sync: 36.9k ±0.6k (- 11 %)
> rand sync: 36.9k ±0.2k (- 11 %)
> null:
> read:
> seq aio: 315.2k ±11.0k (+18 %)
> rand aio: 300.5k ±10.8k (+14 %)
> seq sync: 114.2k ± 3.6k (-16 %)
> rand sync: 112.5k ± 2.8k (-16 %)
> write:
> seq aio: 222.6k ±6.8k (-21 %)
> rand aio: 220.5k ±6.8k (-23 %)
> seq sync: 117.2k ±3.7k (-18 %)
> rand sync: 116.3k ±4.4k (-18 %)
>
> (I don't know what's going on in the null-write AIO case, sorry.)
>
> Here's results for numjobs=4:
>
> "Before", i.e. without multithreading in QSD/FUSE (results compared to
> numjobs=1):
>
> file:
> read:
> seq aio: 104.7k ± 0.4k (- 13 %)
> rand aio: 111.5k ± 0.4k (- 2 %)
> seq sync: 71.0k ±13.8k (+ 36 %)
> rand sync: 41.4k ± 0.1k (+297 %)
> write:
> seq aio: 79.4k ±0.1k (- 1 %)
> rand aio: 78.6k ±0.1k (± 0 %)
> seq sync: 83.3k ±0.1k (+101 %)
> rand sync: 82.0k ±0.2k (+ 98 %)
> null:
> read:
> seq aio: 260.5k ±1.5k (- 2 %)
> rand aio: 260.1k ±1.4k (- 2 %)
> seq sync: 291.8k ±1.3k (+115 %)
> rand sync: 280.1k ±1.7k (+115 %)
> write:
> seq aio: 280.1k ±1.7k (± 0 %)
> rand aio: 279.5k ±1.4k (- 3 %)
> seq sync: 306.7k ±2.2k (+116 %)
> rand sync: 305.9k ±1.8k (+117 %)
>
> (As probably expected, little difference in the AIO case, but great
> improvements in the sync case because it kind of gives it an artificial
> iodepth of 4.)
>
> "After", i.e. with four threads in QSD/FUSE (now results compared to the
> above):
>
> file:
> read:
> seq aio: 193.3k ± 1.8k (+ 85 %)
> rand aio: 329.3k ± 0.3k (+195 %)
> seq sync: 66.2k ±13.0k (- 7 %)
> rand sync: 40.1k ± 0.0k (- 3 %)
> write:
> seq aio: 219.7k ±0.8k (+177 %)
> rand aio: 217.2k ±1.5k (+176 %)
> seq sync: 92.5k ±0.2k (+ 11 %)
> rand sync: 91.9k ±0.2k (+ 12 %)
> null:
> read:
> seq aio: 706.7k ±2.1k (+171 %)
> rand aio: 714.7k ±3.2k (+175 %)
> seq sync: 431.7k ±3.0k (+ 48 %)
> rand sync: 435.4k ±2.8k (+ 50 %)
> write:
> seq aio: 746.9k ±2.8k (+167 %)
> rand aio: 749.0k ±4.9k (+168 %)
> seq sync: 420.7k ±3.1k (+ 37 %)
> rand sync: 419.1k ±2.5k (+ 37 %)
>
> So this helps mainly for the AIO cases, but also in the null sync cases,
> because null is always CPU-bound, so more threads help.
>
> Signed-off-by: Hanna Czenczek <[email protected]>
> ---
> block/export/fuse.c | 205 ++++++++++++++++++++++++++++++++++----------
> 1 file changed, 159 insertions(+), 46 deletions(-)Reviewed-by: Stefan Hajnoczi <[email protected]>
signature.asc
Description: PGP signature
