On Tue, Jul 11, 2017, at 11:45, Martin Pitt wrote:
> Maybe there is some trick ("sort" the blocks on the file system or so) to
> make qcow files rsyncable better. But in conclusion, right now this delta
> approach doesn't buy us much.
The problem here is qcow's compression. Even when changing the image
only a
little, the compression will change almost everything in the resulting
file.
casync compresses blocks after checksumming them, so it's possible to
use it
on raw images.
After hearing Lennart's talk about this in Berlin the other day, I
decided it's
worth to give it another look.
To test it, I downloaded the last three fedora-26 images, which should
have
enough in common to see some benefits:
image commmit date
------------------------
c758 83abe8a8 Jul 1
5545 2623987a Jul 10
4dad 6d6be503 Jul 12
And uncompressed them with
$ qemu-img convert <image>.qcow2 <image>.raw
I then ran `casync make` with the same store on each of them, starting
with the
oldest. I repeated the experiment for a couple of different chunk sizes,
with a
new store for each chunk size.
image reuse size files time
---------------------------------------
64k c758 40% 1.1G 74k 17 min
5545 74% 1.5G 100k 9 min
4dad 93% 1.7G 107k 4 min
---------------------------------------
512k c758 37% 0.9G 11k 18 min
5545 64% 1.6G 17k 12 min
4dad 84% 1.8G 20k 7 min
---------------------------------------
1M c758 34% 0.9G 5k 19 min
5545 56% 1.6G 9k 14 min
4dad 77% 2.0G 11k 9 min
---------------------------------------
8M c758 38% 0.9G 355 24 min
5545 37% 1.8G 713 24 min
4dad 40% 2.6G 1k 23 min
reuse: the number of chunks that were reused (as reported by casync)
size: the size of the store on disk
files: the number of files in the store
time: runtime on my system
The basline for reused chunks for these images seems to be in the high
30%. I
guess these are just empty blocks.
We're definitely not gaining anything at a chunk size of 8M. The store
has
exactly the same size and generating it takes the same time as
compressing the
image directly with `xz`. The store also grows almost linearly with the
amount
of images, so not many chunks seem to be reused between images.
The smaller chunk sizes seem to present a nice balance between number of
files
and reused chunks. Like Martin already mentioned, the bottleneck might
be http
requests.
Before we spend any more time on this, let's find out if this could this
give
us any tangible benefits that justify the additional complexity and
developer
time. It saves quite some disk space and bandwidth (for both developers
and
test machines), at the expense of long image creation times when the
store is
empty.
I don't know the space and bandwidth constraints of our test runners. So
I'm
not sure if this is really worth pursuing. What do you think?
Cheers
Lars
P.S.: I also ran the same tests on the compressed qcow images for
comparison. It
behaved as expected: the highest chunk reuse achieved was 21% for the
64k chunk
size, but usually it was *much* lower than that. Not worth it.
I'll running another batch of tests tonight, which puts all current
images into
one store. Let's see how much saving we'll get between different
operating
systems.
_______________________________________________
cockpit-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]