On 28/06/2016 23:01, Paolo Bonzini wrote: > > > On 24/06/2016 17:06, Denis V. Lunev wrote: >> From: Evgeny Yakovlev <eyakov...@virtuozzo.com> >> >> Some guests (win2008 server for example) do a lot of unnecessary >> flushing when underlying media has not changed. This adds additional >> overhead on host when calling fsync/fdatasync. >> >> This change introduces a dirty flag in BlockDriverState which is set >> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to >> avoid unnesessary flushing when storage is clean. >> >> The problem with excessive flushing was found by a performance test >> which does parallel directory tree creation (from 2 processes). >> Results improved from 0.424 loops/sec to 0.432 loops/sec. >> Each loop creates 10^3 directories with 10 files in each. >> >> Signed-off-by: Evgeny Yakovlev <eyakov...@virtuozzo.com> >> Signed-off-by: Denis V. Lunev <d...@openvz.org> >> CC: Kevin Wolf <kw...@redhat.com> >> CC: Max Reitz <mre...@redhat.com> >> CC: Stefan Hajnoczi <stefa...@redhat.com> >> CC: Fam Zheng <f...@redhat.com> >> CC: John Snow <js...@redhat.com> >> --- >> block.c | 1 + >> block/dirty-bitmap.c | 3 +++ >> block/io.c | 19 +++++++++++++++++++ >> include/block/block_int.h | 2 ++ >> 4 files changed, 25 insertions(+) >> >> diff --git a/block.c b/block.c >> index f4648e9..e36f148 100644 >> --- a/block.c >> +++ b/block.c >> @@ -2582,6 +2582,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset) >> ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS); >> bdrv_dirty_bitmap_truncate(bs); >> bdrv_parent_cb_resize(bs); >> + bs->dirty = true; /* file node sync is needed after truncate */ >> } >> return ret; >> } >> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c >> index 4902ca5..54e0413 100644 >> --- a/block/dirty-bitmap.c >> +++ b/block/dirty-bitmap.c >> @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t >> cur_sector, >> } >> hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors); >> } >> + >> + /* Set global block driver dirty flag even if bitmap is disabled */ >> + bs->dirty = true; >> } >> >> /** >> diff --git a/block/io.c b/block/io.c >> index 7cf3645..8078af2 100644 >> --- a/block/io.c >> +++ b/block/io.c >> @@ -2239,6 +2239,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) >> goto flush_parent; >> } >> >> + /* Check if storage is actually dirty before flushing to disk */ >> + if (!bs->dirty) { >> + /* Flush requests are appended to tracked request list in order so >> that >> + * most recent request is at the head of the list. Following code >> uses >> + * this ordering to wait for the most recent flush request to >> complete >> + * to ensure that requests return in order */ >> + BdrvTrackedRequest *prev_req; >> + QLIST_FOREACH(prev_req, &bs->tracked_requests, list) { >> + if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) { >> + continue; >> + } >> + >> + qemu_co_queue_wait(&prev_req->wait_queue); >> + break; >> + } >> + goto flush_parent; > > Can you just have a CoQueue specific to flushes, where a completing > flush does a restart_all on the CoQueue?
Something like this: current_gen = bs->write_gen; if (bs->flush_started_gen >= current_gen) { while (bs->flushed_gen < current_gen) { qemu_co_queue_wait(&bs->flush_queue); } return; } bs->flush_started_gen = current_gen; ... if (current_gen < bs->flushed_gen) { bs->flushed_gen = current_gen; qemu_co_queue_restart_all(&bs->flush_queue); } Paolo > Flushes are never serialising, so there's no reason for them to be in > tracked_requests (I posted patches a while ago that instead use a simple > atomic counter, but they will only be in 2.8).