On Mon, Dec 13, 2021 at 2:31 PM Luke Mauldin <lukemaul...@icloud.com> wrote:
>
> From reading the documentation, I can see that Subversion 1.14 supports both 
> zlib and lz4 compression.  I am running Subversion on FreeBSD 13.X on ZFS 
> which supports native zstd compression.  Some of the repos I host are 
> relatively large (60K revisions and 60GB+) and I am wondering what 
> combination will give me the best performance?  Currently, I have Subversion 
> compression disabled and ZFS with zstd compression enabled.  In this setup, 
> ZFS reports a compression ratio of 1.69X.  I would think if Subversion 
> natively supported ZSTD compression that would be best but since it does not, 
> I just wanted to see if anyone had recommendations?


As I understand it, the motivation for adding LZ4 compression (added
in 1.10) was speed. From vague memory (I haven't looked into
compression algorithms recently), I think zlib achieves a better
compression ratio in terms of disk space saved, but LZ4 is faster. I
haven't had experience with zstd yet.

It is difficult to say which compression format would give the "best"
performance for a particular application without some experimentation
because things like hardware I/O speeds and the nature of the data
being compressed affect the outcome.

Are you looking for the best speed, the best compression ratio, a good
tradeoff between the two?

If you want to conserve disk space, I would suggest (if it's feasible
and on a separate machine, not in production), to produce a dumpfile
and load it twice, once with zlib and once with LZ4, and then compare
the resulting on-disk sizes to that of the volumes on zstd. Note
Subversion's data deduplication feature: if this was turned off in the
past or is off now, some or all of your repo might contain duplicated
data; to make the experiment "fair" you would need to take this into
account.

If you are looking for best performance in terms of speed, I don't
have a simple answer for this because it depends on a great many
variables in which Subversion's compression is but one. I would assume
that networking I/O probably plays a bigger role than compression
here.

Hope this helps,
Nathan

Reply via email to