Re: [PATCH v2] block/file-posix: optimize append write

Kevin Wolf Mon, 21 Oct 2024 15:12:21 -0700

Am 21.10.2024 um 15:21 hat Sam Li geschrieben:
> Kevin Wolf <kw...@redhat.com> 于2024年10月18日周五 16:37写道：
> >
> > Am 04.10.2024 um 12:41 hat Sam Li geschrieben:
> > > When the file-posix driver emulates append write, it holds the lock
> > > whenever accessing wp, which limits the IO queue depth to one.
> > >
> > > The write IO flow can be optimized to allow concurrent writes. The lock
> > > is held in two cases:
> > > 1. Assumed that the write IO succeeds, update the wp before issuing the
> > > write.
> > > 2. If the write IO fails, report that zone and use the reported value
> > > as the current wp.
> >
> > What happens with the concurrent writes that started later and may not
> > have completed yet? Can we really just reset to the reported value
> > before all other requests have completed, too?
> >
> > > Signed-off-by: Sam Li <faithilike...@gmail.com>
> > > ---
> > >  block/file-posix.c | 57 ++++++++++++++++++++++++++++++----------------
> > >  1 file changed, 38 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index 90fa54352c..a65a23cb2c 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -2482,18 +2482,46 @@ static int coroutine_fn 
> > > raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,
> > >      BDRVRawState *s = bs->opaque;
> > >      RawPosixAIOData acb;
> > >      int ret;
> > > -    uint64_t offset = *offset_ptr;
> > > +    uint64_t end_offset, end_zone, offset = *offset_ptr;
> > > +    uint64_t *wp;
> >
> > Without CONFIG_BLKZONED, these are unused variables and break the build.
> >
> > They are only used in the first CONFIG_BLKZONED block, so you could just
> > declare them locally there.
> 
> Thanks! Will do.
> 
> >
> > >
> > >      if (fd_open(bs) < 0)
> > >          return -EIO;
> > >  #if defined(CONFIG_BLKZONED)
> > >      if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
> > >          bs->bl.zoned != BLK_Z_NONE) {
> > > -        qemu_co_mutex_lock(&bs->wps->colock);
> > > -        if (type & QEMU_AIO_ZONE_APPEND) {
> > > -            int index = offset / bs->bl.zone_size;
> > > -            offset = bs->wps->wp[index];
> > > +        BlockZoneWps *wps = bs->wps;
> > > +        int index = offset / bs->bl.zone_size;
> > > +
> > > +        qemu_co_mutex_lock(&wps->colock);
> >
> > This is preexisting, but what is the reason for using coroutine locks
> > here? There doesn't seem to be any code running under the lock that can
> > yield, so a normal mutex might be more efficient.
> 
> Using coroutine locks is to avoid blocking in coroutines. QemuMutex
> blocks the thread when the lock is held instead of yielding.


Right, but usually you have to wait only for a very short time until the
mutex is released again and CoMutexes are more expensive then.

You absolutely do need to use CoMutex when the coroutine can yield in
the critical section, but if it can't, the CoMutex is often worse.

Though as I said here...

> > Hm... Looking a bit closer, get_zones_wp() could probably be a
> > coroutine_fn and call hdev_co_ioctl() instead of calling ioctl()
> > directly in order to avoid blocking.

...we should probably use a coroutine version of ioctl() instead of
the blocking one, and then you do need the CoMutex.

> > > +        wp = &wps->wp[index];
> >
> > Also preexisting, but who guarantees that index is within the bounds? A
> > stacked block driver may try to grow the file size.
> 
> Right. It needs to be checked if it's over nr_zones.

Can you send a separate fix for this, please? (Can be both in one
patch series, though.)

> >
> > > +        if (!BDRV_ZT_IS_CONV(*wp)) {
> > > +            if (type & QEMU_AIO_WRITE && offset != *wp) {
> > > +                error_report("write offset 0x%" PRIx64 " is not equal to 
> > > the wp"
> > > +                             " of zone[%d] 0x%" PRIx64 "", offset, 
> > > index, *wp);
> >
> > We can't error_report() in an I/O path that can be triggered by the
> > guest, it could result in unbounded log file growth.
> 
> Those error messages show in the err path and are good for debugging
> zoned device emulation.
> 
> I was wondering if there is a better approach to print errors? Use
> error_report_once() to reduce the log?

If it's for debugging, I think trace points are best.

Kevin

Re: [PATCH v2] block/file-posix: optimize append write

Reply via email to