migration: support KVM_CLEAR_DIRTY_LOG

Peter Xu Thu, 16 May 2019 02:26:27 -0700

On Thu, May 09, 2019 at 10:33:19AM +0800, Peter Xu wrote:
> On Wed, May 08, 2019 at 01:55:07PM +0200, Paolo Bonzini wrote:
> > On 08/05/19 06:39, Peter Xu wrote:
> > >> The disadvantage of this is that you won't clear in the kernel those
> > >> dirty bits that come from other sources (e.g. vhost or
> > >> address_space_map).  This can lead to double-copying of pages.
> > >>
> > >> Migration already makes a local copy in rb->bmap, and
> > >> memory_region_snapshot_and_clear_dirty can also do the clear.  Would it
> > >> be possible to invoke the clear using rb->bmap instead of the KVMSlot's
> > >> new bitmap?
> > >
> > > Actually that's what I did in the first version before I post the work
> > > but I noticed that there seems to have a race condition with the
> > > design.  The problem is we have multiple copies of the same dirty
> > > bitmap from KVM and the race can happen with those multiple users
> > > (bitmaps of the users can be a merged version containing KVM and other
> > > sources like vhost, address_space_map, etc. but let's just make it
> > > simpler to not have them yet).
> > 
> > I see now.  And in fact the same double-copying inefficiency happens
> > already without this series, so you are improving the situation anyway.
> > 
> > Have you done any kind of benchmarking already?
> 
> Not yet.  I posted the series for some initial reviews first before
> moving on with performance tests.
> 
> My plan of the test scenario could be:
> 
> - find a guest with relatively large memory (I would guess it is
>   better to have memory like 64G or even more to make some big
>   difference)
> 
> - run random dirty memory workload upon most of the mem, with dirty
>   rate X Bps.
> 
> - setup the migration bandwidth to Y Bps (Y should be bigger than X
>   but not that big.  One could be X=800M and Y=1G to emulate 10G nic
>   with a workload that we can still converge with precopy only) and
>   start precopy migration.
> 
> - measure total migration time with CLEAR_LOG on & off. We should
>   expect the guest to have these with CLEAR_LOG: (1) not hang during
>   log_sync, and (2) migration should complete faster.


Some updates on performance numbers.

Summary: the ideal case below shows ~40% or even more time reduced to
migrate the same VM with the same workload.  In other words, it could
be seen as ~40% faster than before.

Test environment: 13G guest, 10G test mem (so I leave 3G untouched),
dirty rate 900MB/s, BW 10Gbps to emulate ixgbe, downtime 100ms.

IO pattern: I pre-fault all the 10G mem then I do random writes (with
command "mig_mon mm_dirty 10240 900 random" [1]) upon the test memory
with a constant dirty rate (900MB/s, as mentioned).  Then I migrate
during the IOs.

Here's the total migration time of such VM (for each scenario, I run
the migration 5 times then I get an average migration total time
used):

   |--------------+---------------------+-------------|
   | scenario     | migration times (s) | average (s) |
   |--------------+---------------------+-------------|
   | no CLEAR_LOG | 55, 54, 56, 74, 54  |          58 |
   | 1G chunk     | 40, 39, 41, 39, 40  |          40 |
   | 128M chunk   | 38, 40, 37, 40, 38  |          38 |
   | 16M chunk    | 42, 40, 38, 41, 38  |          39 |
   | 1M chunk     | 37, 40, 36, 40, 39  |          38 |
   |--------------+---------------------+-------------|

The first "no CLEAR_LOG" means the master branch which still uses the
GET_DIRTY only.  The latter four scenarios are all with the new
CLEAR_LOG interface, aka, this series.  The test result shows that
128M chunk size seems to be a good default value instead of 1G (which
this series used).  I'll adjust that accordingly when I post the next
version.

[1] https://github.com/xzpeter/clibs/blob/master/bsd/mig_mon/mig_mon.c

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH 00/11] kvm/migration: support KVM_CLEAR_DIRTY_LOG

Reply via email to