Hi!

[ Sorry, have been meaning to update this report, as I've mentioned
  when updating directly people that asked about the current state of
  it on IRC, but seems I never got to it, besides what was already
  covered on the debian-devel mailing list. ]

On Sat, 2022-04-09 at 18:52:07 +0200, Helmut Grohne wrote:
> would you maybe reconsider adding zstd decompression support at this
> time?

I'm always open to reconsideration. :)

> On Sun, Mar 18, 2018 at 04:38:15AM +0100, Guillem Jover wrote:
> > So, the items that come to mind (most from the dpkg FAQ [F]:
> > 
> > * Availability in general Unix systems would be one. I think the code
> >   should be portable, but I've not checked properly.
> 
> Given the number of places it has been vendored and used at this time, I
> suppose we'd have seen issues. While there is optimized assembly for
> x86_64 cpus and uses e.g. __builtin_ctzll, all those uses are carefully
> guarded and have portable alternative implementations. Do you see any
> particular unixes to watch out here? From a processor architecture pov,
> I've never seen issues with zstd in e.g. rebootstrap. (The present
> failure for riscv64 likely isn't caused by zstd itself.)

Right, probably not a concern now, yes.

> > * Size of the shared library another, it would be by far the fattest
> >   compression lib used by dpkg. It's not entirely clear whether the
> >   shlib embeds a zlib library?
> 
> What made you think so? Is it the zlibWrapper directory in the source?
> That's an api adapter of the gzip interface to the zstd compressor.

I don't recall now, but rechecking its source, that looks like what
might have triggered that question. Several of the files f.ex. gzlib.c,
gzread.c, gzwrite.c, etc seems to be locally forked zlib sources. I
don't recall checking (most probably not, so the question above)
whether that was included in the resulting shared library or was
just some kind of "contrib" thing though, or other possibilities.

> The size remains a possible issue otherwise.

Yes.

> > * Increase in the (build-)essential set (directly and transitively).
> 
> We're now in a place where libzstd1 is transitively essential.

Ah, thanks, I don't recall this being mentioned before (at least on IRC).

Right, in Debian this is coming now from util-linux (Essential:yes)
depending on libsystemd0 depending on libzstd1.

> > * It also seems the format has changed quite some times already, and
> >   it's probably the reason for the fat shlib. Not sure if the format
> >   has stabilized enough to use this as good long-term storage format,
> >   and what's the policy regarding supporting old formats for example,
> >   given that this is intended mainly to be used for real-time and
> >   streaming content and similar. For example the Makefile for libzstd
> >   defaults to supporting v0.4+ only, which does not look great.
> 
> Given the state of development and the wide adoption, it would seem
> unlikely to me to have it break more compatibility.

I'd expect so too, at least now.

> Also note that there
> is a trade-off here between size and compatibility. You cannot have both
> a small size and support all ancient formats.

Sure, but I thought given that this is a property of the state of the
upstream zstd format, it was relevant to mention.

> Beyond these cases, I think compatibility also goes the other way round.
> If a significant portion of .debs in the wild are compressed using zstd
> (and that's what we're seeing), dpkg should be able to decompress them
> even if it wasn't the one that introduced them. You care very much about
> being able to decompress each and every ancient .deb, but in practice we
> also care about decompressing those .debs that currently reside in
> Ubuntu's PPAs.

That's certainly all true, and that's something that is also bothering
me too. :/

As I've mentioned in the past on the debian-devel mailing list
(AFAIR) and/or on IRC, the way I've seen this was that:

  - zstd offered a different trade-off on compression and
    decompression times vs size, which might be relevant depending on
    what is the bottleneck for users or buildds, say network, cpu,
    disk, or for "throw-away", "rolling" vs "stable" builds, etc,
    where we have to use a single compressor for all cases, instead
    of say one per use-case.
  - There's xz threaded decompression now merged in liblzma upstream,
    and I've got the patches for dpkg ready for it, and I was
    investigating the zlib-ng alternative which would somewhat improve
    on the speed vs size divide.
  - The Ubuntu people went ahead with the divergence anyway, even
    after being told there was no guarantee this would be added. This
    in a way makes upstream subservient to downsreams diverging the
    format. There's for example another downstream that has added .lz
    support, of course it does not share the same "widespreadness", but
    illustrates the point.
  - Adding new compression support, if the needs seem somewhat covered
    already by existing ones, implies a maintenance burden practically
    for eternity. For example, when the bzip2 upstream situation got
    pretty dire, and then it was picked up but with code being
    switched to Rust (which at the time implied portability would get
    affected), I had to consider whether I'd need to fork a bzip2
    project to keep it as functional. The compression landscape has
    unfortunately this inherent property that once a clearly superior
    format comes along, it leaves behind the carcasses of once thought
    most glorious contenders.

But, yes this is a problem now. :(

> In my personal workflow, I decompress very many packages into tmpfs (or
> ram). This is bottle-necked on CPU. In my experience, zstd decompression
> is almost 100 times faster than xz decompression. That's a fairly big
> improvement. At this time, I'm convinced that zstd is better for the
> "compress once, decompress often" use case than xz, which still excels
> at "compress once, decompress rarely". I admit that I only get the
> benefits if dpkg also supports zstd as a compressor and many relevant
> packages switch to it.
> 
> So at this point, I think that supporting zstd decompression is
> something reasonable to add to dpkg. Please reconsider your decision.

I'm not thrilled about how this was handled TBH, and the pressure that
has now formed around this. But if both the upcoming threaded xz
decompression support is not going to be satisfactory enough anyway,
and given the ".deb ecosystem" wide divergence, I guess there it
might be reasonable to add this. :/

Depending on the intended use, say local use, then one possibility in
Debian could be to compile the support by using the CLI instead of the
shared library, and making it a Recommends or similar. Of course that
has the additional trade-off of increasing even more the installation
size, and the error condition on missing CLI might need to be improved.

Thanks,
Guillem

Reply via email to