Re: Experiment: casync for test image downloads/uploads

Lars Karlitski Tue, 18 Jul 2017 13:57:55 -0700

On Tue, Jul 11, 2017, at 11:45, Martin Pitt wrote:
> Maybe there is some trick ("sort" the blocks on the file system or so) to
> make qcow files rsyncable better. But in conclusion,  right now this delta
> approach doesn't buy us much.


The problem here is qcow's compression. Even when changing the image
only a
little, the compression will change almost everything in the resulting
file.
casync compresses blocks after checksumming them, so it's possible to
use it
on raw images.

After hearing Lennart's talk about this in Berlin the other day, I
decided it's
worth to give it another look.

To test it, I downloaded the last three fedora-26 images, which should
have
enough in common to see some benefits:

    image   commmit   date
    ------------------------
    c758    83abe8a8  Jul 1
    5545    2623987a  Jul 10
    4dad    6d6be503  Jul 12

And uncompressed them with

    $ qemu-img convert <image>.qcow2 <image>.raw

I then ran `casync make` with the same store on each of them, starting
with the
oldest. I repeated the experiment for a couple of different chunk sizes,
with a
new store for each chunk size.

         image  reuse   size  files    time
    ---------------------------------------
     64k  c758    40%   1.1G    74k  17 min
          5545    74%   1.5G   100k   9 min
          4dad    93%   1.7G   107k   4 min
    ---------------------------------------
    512k  c758    37%   0.9G    11k  18 min
          5545    64%   1.6G    17k  12 min
          4dad    84%   1.8G    20k   7 min
    ---------------------------------------
      1M  c758    34%   0.9G     5k  19 min
          5545    56%   1.6G     9k  14 min
          4dad    77%   2.0G    11k   9 min
    ---------------------------------------
      8M  c758    38%   0.9G    355  24 min
          5545    37%   1.8G    713  24 min
          4dad    40%   2.6G     1k  23 min

    reuse: the number of chunks that were reused (as reported by casync)
    size: the size of the store on disk
    files: the number of files in the store
    time: runtime on my system

The basline for reused chunks for these images seems to be in the high
30%. I
guess these are just empty blocks.

We're definitely not gaining anything at a chunk size of 8M. The store
has
exactly the same size and generating it takes the same time as
compressing the
image directly with `xz`. The store also grows almost linearly with the
amount
of images, so not many chunks seem to be reused between images.

The smaller chunk sizes seem to present a nice balance between number of
files
and reused chunks. Like Martin already mentioned, the bottleneck might
be http
requests.

Before we spend any more time on this, let's find out if this could this
give
us any tangible benefits that justify the additional complexity and
developer
time. It saves quite some disk space and bandwidth (for both developers
and
test machines), at the expense of long image creation times when the
store is
empty.

I don't know the space and bandwidth constraints of our test runners. So
I'm
not sure if this is really worth pursuing. What do you think?

Cheers
Lars


P.S.: I also ran the same tests on the compressed qcow images for
comparison. It
behaved as expected: the highest chunk reuse achieved was 21% for the
64k chunk
size, but usually it was *much* lower than that. Not worth it.

I'll running another batch of tests tonight, which puts all current
images into
one store. Let's see how much saving we'll get between different
operating
systems.
_______________________________________________
cockpit-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Re: Experiment: casync for test image downloads/uploads

Reply via email to