On Mon, Dec 13, 2021 at 2:31 PM Luke Mauldin <lukemaul...@icloud.com> wrote: > > From reading the documentation, I can see that Subversion 1.14 supports both > zlib and lz4 compression. I am running Subversion on FreeBSD 13.X on ZFS > which supports native zstd compression. Some of the repos I host are > relatively large (60K revisions and 60GB+) and I am wondering what > combination will give me the best performance? Currently, I have Subversion > compression disabled and ZFS with zstd compression enabled. In this setup, > ZFS reports a compression ratio of 1.69X. I would think if Subversion > natively supported ZSTD compression that would be best but since it does not, > I just wanted to see if anyone had recommendations?
As I understand it, the motivation for adding LZ4 compression (added in 1.10) was speed. From vague memory (I haven't looked into compression algorithms recently), I think zlib achieves a better compression ratio in terms of disk space saved, but LZ4 is faster. I haven't had experience with zstd yet. It is difficult to say which compression format would give the "best" performance for a particular application without some experimentation because things like hardware I/O speeds and the nature of the data being compressed affect the outcome. Are you looking for the best speed, the best compression ratio, a good tradeoff between the two? If you want to conserve disk space, I would suggest (if it's feasible and on a separate machine, not in production), to produce a dumpfile and load it twice, once with zlib and once with LZ4, and then compare the resulting on-disk sizes to that of the volumes on zstd. Note Subversion's data deduplication feature: if this was turned off in the past or is off now, some or all of your repo might contain duplicated data; to make the experiment "fair" you would need to take this into account. If you are looking for best performance in terms of speed, I don't have a simple answer for this because it depends on a great many variables in which Subversion's compression is but one. I would assume that networking I/O probably plays a bigger role than compression here. Hope this helps, Nathan