Heather, thanks, now fixed (datasets was using numeric value for compress= instead of the compression name so it picked zstd instead of gzip - now the switch order is kept the same).
Cheers, Simon > On Jan 15, 2025, at 10:21 PM, Heather Turner <h...@heatherturner.net> wrote: > > With the changes to add zstd support yesterday, the build of R-devel is > failing when zstd is not present, even though the docs say that zstd is > optional. > > The error comes in building the datasets package, see e.g. > https://github.com/r-devel/r-svn/actions/runs/12760693086/job/35566530112. > > Best wishes, > > Heather > > On Mon, Jan 13, 2025, at 1:26 AM, Simon Urbanek wrote: >> I think the first step would have to be to add zstd support to R. zstd >> is a bit controversial (as shown by the community blowback of the >> changes you mentioned) and their build system (calling it that is being >> very generous) is mess so it would require a bit of testing, but it is >> doable. >> >> That said, assuming the above is solved, we have been debating the >> change of compression at CRAN in general for a bit, but the assumptions >> about the file names are built into today’s tools so there would be >> certainly some fall-out - not just in R, but also the ecosystems around >> it. As you pointed out, possibly the safest place to start are >> binaries, since we have tighter control of those and they are used in >> fewer places. >> >> Personally, I think the higher priority is signing, so as we address >> that we may just include the compression change with it since it will >> require some tool changes anyway. I was thinking of using xz as that is >> more stable, already supported and less controversial, but I don’t >> think the choice really matters - it just has to be a compression which >> R supports (zstd and xz have different benefits, so it’s always a >> trade-off without a clear winner). >> >> Cheers, >> Simon >> >> >>> On 11 Jan 2025, at 12:16, Jeroen Ooms <jeroeno...@gmail.com> wrote: >>> >>> Many distros and browsers these days use zstd as the preferred >>> compression method. For example if you unpack a .deb or .rpm file on >>> Debian or Fedora there is zstd archive inside. It is claimed that zstd >>> offers improved compression over gzip, but (unlike lzma) it has >>> comparable decompression speed. Maybe it is interesting to get an >>> estimate of how much R packages would benefit from zstd. >>> >>> Testing this for source packages and MacOS binary packages it is easy >>> as we can gunzip and recompress tar.gz files without having to extract >>> the tarball itself: >>> >>> OUTPUT="sizes.txt" >>> echo "FILE GZIP ZSTD" > $OUTPUT >>> for x in *gz; do >>> FILE=$(basename $x) >>> GZIP=$(wc -c "$x" | awk '{print $1}') >>> ZSTD=$(gunzip -c $x | zstd -19 | wc -c) >>> echo "$FILE $GZIP $ZSTD" | tee -a $OUTPUT >>> done >>> >>> Attached are results of running this script on the 500 most downloaded >>> CRAN packages. It shows about 16% size reduction for sources, and 19% >>> for binaries. >>> >>> Zstd is BSD licensed C code that can easily be embedded in any project. >>> <sources.txt><binaries.txt>______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel