In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth:
> In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth:
> > On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel <
> > [email protected]> wrote:
> >
> > > Aoife Moloney wrote:
> > > > The zstd compression type was chosen to match createrepo_c settings.
> > > > As an alternative, we might want to choose xz,
> > >
> > > Since xz consistently compresses better than zstd, I would strongly
> > > suggest
> > > using xz everywhere to minimize download sizes. However:
> > >
> > > > especially after zlib-ng has been made the default in Fedora and brought
> > > > performance improvements.
> > >
> > > zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
> > > (which is mostly due to the format, so, while some implementations manage
> > > to
> > > do better than others at the expense of more compression time, there is a
> > > limit to how well they can do and it is nowhere near xz or even zstd) and
> > > should hence never be used at all.
> > >
> > >
> > There are two parts to this which users will see as 'slowness'. Part one is
> > downloading the data from a mirror. Part two is uncompressing the data. In
> > work I have been a part of, we have found that while xz gave us much
> > smaller files, the time to uncompress was so much larger that our download
> > gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger)
> > but uncompressed much faster than xz. This is data dependent though so it
> > would be good to see if someone could test to see if xz uncompression of
> > the datafiles will be too slow.
>
> Hi there,
>
> Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB.
Added tests with zstd 1-19, not using a dictionary to improve it any
further.
Input File: f41-filelist.xml, Size: 985194446 bytes
ZStd Level 1, 1.7s to compress, 6.46% file size, 0.6s decompress
ZStd Level 2, 1.7s to compress, 6.34% file size, 0.7s decompress
ZStd Level 3, 2.1s to compress, 6.26% file size, 0.7s decompress
ZStd Level 4, 2.3s to compress, 6.26% file size, 0.7s decompress
ZStd Level 5, 5.7s to compress, 5.60% file size, 0.6s decompress
ZStd Level 6, 7.2s to compress, 5.42% file size, 0.6s decompress
ZStd Level 7, 8.1s to compress, 5.39% file size, 0.6s decompress
ZStd Level 8, 9.5s to compress, 5.31% file size, 0.6s decompress
ZStd Level 9, 10.4s to compress, 5.28% file size, 0.6s decompress
ZStd Level 10, 13.6s to compress, 5.26% file size, 0.6s decompress
ZStd Level 11, 18.4s to compress, 5.25% file size, 0.6s decompress
ZStd Level 12, 19.5s to compress, 5.25% file size, 0.6s decompress
ZStd Level 13, 30.9s to compress, 5.25% file size, 0.6s decompress
ZStd Level 14, 39.7s to compress, 5.23% file size, 0.6s decompress
ZStd Level 15, 56.1s to compress, 5.21% file size, 0.6s decompress
ZStd Level 16, 1min58s to compress, 5.52% file size, 0.7s decompress
ZStd Level 17, 2min25s to compress, 5.36% file size, 0.7s decompress
ZStd Level 18, 3min46s to compress, 5.43% file size, 0.8s decompress
ZStd Level 19, 10min36s to compress, 4.66% file size, 0.7s decompress
So to save 5.2MB in filesize (lvl19 vs lvl15) the server have to spend
eleven times longer compressing the file (and I did not look at resources
like CPU or RAM while doing this). I am sure there are other compression
mechanisms that can squeeze these files a bit further, but at what cost.
If it is a once a day event, maybe a high compression ration is
justifiable. If it has to happen hundreds of times per day - not so much.
## zstd
function do_zstd()
{
let cl=1
echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes
echo
while [[ $cl -le 19 ]]
do
echo ZStd compression level ${cl}
echo Time to compress the file
time zstd -z -${cl} ${INPUTFILE}
COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.zst | awk '{print $5}')
echo Compressed to
echo "scale=5
${COMPRESSED_SIZE}/${INPUTFILESIZE}*100
"|bc
echo % of original
echo Time to decompress the file, output to /dev/null
time zstd -d -c ${INPUTFILE}.zst > /dev/null
rm -f ${INPUTFILE}.zst
let cl=$cl+1
echo
done
}
--
Kind regards,
/S
--
_______________________________________________
devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it:
https://pagure.io/fedora-infrastructure/new_issue