Re: [gentoo-dev] [PATCH] cargo.eclass: Emit a warning if the package uses 300+ crates

2025-01-13 Thread Florian Schmaus

On 12/01/2025 13.56, Michał Górny wrote:

Emit a QA warning suggesting the use of crate tarball, when the package
in question uses 300 crates or more.  Such a long crate lists cause
ebuilds and Manifests to grow very fast, causing significant space
consumption on end user systems (including users who are not using
the package in question) and git history growth.  On top of that,
fetching that many crates takes significant time.

The number of 300 is pretty arbitrary, chosen approximately to match
Manifests that are over 100 KiB in size.  We should probably look into
lowering in the future, as more packages are transitioned.
Thanks for your proposal. I know you wrote it because Gentoo is 
important to you.


I am sorry, however, but the arbitrary limit you propose is harmful, and 
its necessity is questionable.


It is unnecessary, at least in its current form, because the size growth 
of Gentoo's package repository is manageable. See the previous analysis 
for EGO_SUM [1].


What is more worrisome, however, is that it is harmful.

First, switching from individual crates to a single crate tarball 
disallows inter-package crate archive reuse. Often, users will already 
have the required crates downloaded because another installed package 
used them. With an artificial create count limit, users must download 
rather large crate tarballs, causing unnecessary traffic and increasing 
the disk space on Gentoo's mirrors and end-user systems. The crate 
tarballs quickly eat away the saved disk space in the ebuild repository.


Even worse, crate tarballs negatively impact the security of Gentoo 
users as they make it harder to audit ebuilds, and third-party crate 
tarballs add a further distinct party that can inject malicious code. 
Considering the recent supply chain attacks, this alone is a show-stopper.


Why is this warning suddenly necessary? Did a user run into an issue 
caused by more than 300 entries?


- Flow

1: 
https://public-inbox.gentoo.org/gentoo-dev/6ed0f286-f9eb-9e93-4fec-296646f79...@gentoo.org/





Re: [gentoo-dev] [PATCH] cargo.eclass: Emit a warning if the package uses 300+ crates

2025-01-13 Thread Michał Górny
On Mon, 2025-01-13 at 10:40 +0100, Florian Schmaus wrote:
> First, switching from individual crates to a single crate tarball 
> disallows inter-package crate archive reuse. Often, users will already 
> have the required crates downloaded because another installed package 
> used them. With an artificial create count limit, users must download 
> rather large crate tarballs, causing unnecessary traffic and increasing 
> the disk space on Gentoo's mirrors and end-user systems. The crate 
> tarballs quickly eat away the saved disk space in the ebuild repository.

I'm sure you've also done a thorough analysis on how much crate reuse
actually happens, as well as of the impact of adding thousands of tiny
files to Gentoo mirrors, the inefficiency of fetching them one by one,
and especially how badly crates.io actually handles that.

I'm also sure you've done a thorough analysis of actual disk space use,
that also takes into consideration the space wasted by thousands of
tiny, inefficiently compressed files, compared to crate tarballs that
benefit both from much stronger compression algorithm, as well
as the opportunity to process much larger data blocks.

> Even worse, crate tarballs negatively impact the security of Gentoo 
> users as they make it harder to audit ebuilds, and third-party crate 
> tarballs add a further distinct party that can inject malicious code. 
> Considering the recent supply chain attacks, this alone is a show-stopper.

`cargo audit` does not care about how crates are delivered to Gentoo
systems.

> Why is this warning suddenly necessary? Did a user run into an issue 
> caused by more than 300 entries?

It is not "sudden".  It is an ongoing effort.


-- 
Best regards,
Michał Górny



signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] [PATCH] cargo.eclass: Emit a warning if the package uses 300+ crates

2025-01-13 Thread orbea
On Mon, 13 Jan 2025 10:40:30 +0100
Florian Schmaus  wrote:

> On 12/01/2025 13.56, Michał Górny wrote:
> > Emit a QA warning suggesting the use of crate tarball, when the
> > package in question uses 300 crates or more.  Such a long crate
> > lists cause ebuilds and Manifests to grow very fast, causing
> > significant space consumption on end user systems (including users
> > who are not using the package in question) and git history growth.
> > On top of that, fetching that many crates takes significant time.
> > 
> > The number of 300 is pretty arbitrary, chosen approximately to match
> > Manifests that are over 100 KiB in size.  We should probably look
> > into lowering in the future, as more packages are transitioned.  
> Thanks for your proposal. I know you wrote it because Gentoo is 
> important to you.
> 
> I am sorry, however, but the arbitrary limit you propose is harmful,
> and its necessity is questionable.

Its worth pointing out that is already being done in Gentoo, see
dev-util/maturin for one example.

> 
> It is unnecessary, at least in its current form, because the size
> growth of Gentoo's package repository is manageable. See the previous
> analysis for EGO_SUM [1].
> 
> What is more worrisome, however, is that it is harmful.
> 
> First, switching from individual crates to a single crate tarball 
> disallows inter-package crate archive reuse. Often, users will
> already have the required crates downloaded because another installed
> package used them. With an artificial create count limit, users must
> download rather large crate tarballs, causing unnecessary traffic and
> increasing the disk space on Gentoo's mirrors and end-user systems.
> The crate tarballs quickly eat away the saved disk space in the
> ebuild repository.
> 
> Even worse, crate tarballs negatively impact the security of Gentoo 
> users as they make it harder to audit ebuilds, and third-party crate 
> tarballs add a further distinct party that can inject malicious code. 
> Considering the recent supply chain attacks, this alone is a
> show-stopper.
> 
> Why is this warning suddenly necessary? Did a user run into an issue 
> caused by more than 300 entries?
> 
> - Flow
> 
> 1: 
> https://public-inbox.gentoo.org/gentoo-dev/6ed0f286-f9eb-9e93-4fec-296646f79...@gentoo.org/
> 
> 




Re: [gentoo-dev] [PATCH] cargo.eclass: Emit a warning if the package uses 300+ crates

2025-01-13 Thread Ionen Wolkens
On Mon, Jan 13, 2025 at 05:23:54AM -0800, orbea wrote:
> On Mon, 13 Jan 2025 10:40:30 +0100
> Florian Schmaus  wrote:
> 
> > On 12/01/2025 13.56, Michał Górny wrote:
> > > Emit a QA warning suggesting the use of crate tarball, when the
> > > package in question uses 300 crates or more.  Such a long crate
> > > lists cause ebuilds and Manifests to grow very fast, causing
> > > significant space consumption on end user systems (including users
> > > who are not using the package in question) and git history growth.
> > > On top of that, fetching that many crates takes significant time.
> > > 
> > > The number of 300 is pretty arbitrary, chosen approximately to match
> > > Manifests that are over 100 KiB in size.  We should probably look
> > > into lowering in the future, as more packages are transitioned.  
> > Thanks for your proposal. I know you wrote it because Gentoo is 
> > important to you.
> > 
> > I am sorry, however, but the arbitrary limit you propose is harmful,
> > and its necessity is questionable.
> 
> Its worth pointing out that is already being done in Gentoo, see
> dev-util/maturin for one example.

ftr this is something I was planning to do either way, but kept
procrastinating given that package needs special handling to
handle crates used by tests (it builds separate rust packages
for its tests with their own crates). This just prompted me to
finally have a look before a potential warning hits.
-- 
ionen


signature.asc
Description: PGP signature