So, after some testing, I want to provide a summary
about this.

There are 3 cache modes in qemu for block devices:
none, writeback, and writethrough (default).

The series of patches we're talking about changed
all metadata updates to be syncronous.

The change in question does _not_ affect writethrough
mode, since there, all writes always were syncronous
anyway.  This is despite the information found in
first message that started this bugreport.  Alexander
Loob said:

  Changing the cache value to cache=none or writethrough
  has no effect to the write performance.

This is incorrect.  Yet again, the change does not
affect the default writethrough cache mode, since
it were syncronous before, and the change is a no-op
for that mode.  Note that this is the default cache
mode, so actually the change does not affect most
users.

But other cache modes, none and writeback, are affected,
both highly negative, the write performance reduced for
about 10 times or more on some workloads.

This, again, needs some clarifications.  The change only
affects metadata updates, i.e., roughly speaking, new
allocations in qcow2 file.  So if the guest is writing
to already allocated place, there's no penalty from the
changes in question, since in that case there's no
metadata updates happening.

This change affects qcow2 snapshots very significantly.
This is because after taking a snapshot, _all_ blocks
are marked as copy-on-write, so _all_ writes will require
both allocating a new block and decrementing reference
count in the old block.  So there are 2x more metadata
updates going on for every write after taking snapshot,
hence the usage case in question (using qcow2 snapshots)
is affected much more seriously than others.

Another case when cache=writeback is used often is an
OS install, to speed up installation process when it is
obvious that data integrity isn't important (in case of
crash it's just as easy to re-run installation procedure
from the beginning, no big deal).  This usage case is
obviously affected too, increasing install time for up
to 5 times.

And yet another place where this problem hit us is
qemu-img (kvm-img) utility.  When it creates a qcow2
image, it uses writeback cache.  On my home machine,
`kvm-img convert -O qcow2 f.raw f.qcow2' for a 8Gb
win7 image now requires 48 minutes to complete,
instead of usual ~4m.

So, the change does not affect default (and probably
most commonly used) operations of kvm itself.  It makes
unsafe operations (cache=writeback) partially safe at
a cost of large decrease in speed, and it makes some
utilities almost unusable.

I'm reverting it all back for the next debian release.
It may even be reverted in upstream for upcoming 0.13,
but I'm not sure.  This change needs some work before
it will be acceptable.

/mjt



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to