Hi Guillem and others,

Thanks for your extensive reply and the followup clarifying the
inside-out and outside-in distinction.

On Wed, Dec 04, 2024 at 02:03:29PM +0100, Guillem Jover wrote:
> On Thu, 2024-11-28 at 10:54:37 +0100, Helmut Grohne wrote:
> > I think this demonstrates that we probably have something between 10 and
> > 50 packages in unstable that would benefit from a generic parallelism
> > limit based on available RAM. Do others agree that this is a problem
> > worth solving in a more general way?
> 
> I think the general idea make sense, yes.

Given the other replies on this thread, I conclude that we have rough
consensus on this being a problem worth solving (expending effort and
code and later maintenance cost on).

> > For one thing, I propose extending debhelper to provide
> > --min-ram-per-parallel-core as that seems to be the most common way to
> > do it. I've proposed
> > https://salsa.debian.org/debian/debhelper/-/merge_requests/128
> > to this end.
> 
> To me this looks too high in the stack (and too Linux-specific :).

Let me take the opportunity to characterize this proposal inside-out
given your distinction.

I don't think being Linux-specific is necessarily bad here and note that
the /proc interface is also supported by Hurd (I actually checked on a
porter box). The problem we are solving here is a practical one and the
solution we pick now probably is no longer relevant in twenty years.
That's about the time frame I am expect Linux to be the preferred kernel
used by Debian (could be longer, but unlikely shorter).

> I think adding this in dpkg-buildpackage itself would make most sense
> to me, where it is already deciding what amount of parallelism to use
> when specifying «auto» for example.
> 
> Given that this would be and outside-in interface, I think this would
> imply declaring these parameters say as debian/control fields for example,
> or some other file to be parsed from the source tree.

I find that outside-in vs inside-out distinction quite useful, but I
actually prefer an inside-out approach. You detail that picking a
sensible ram-per-core value is environment-specific. Others gave
examples of how build-systems address this in ways of specifying linker
groups with reduced parallelism and you go into detail of how the
compression parallelism is limited based on system ram already. Given
all of these, I no longer am convinced that reducing the package-global
parallelism is the desired solution. Rather, each individual step may
benefit from its own limiting and that's what is already happening in
the archive. It is that inside-out approach that we see in debian/rules
in some packages. What I now find missing is better tooling to support
this inside-out approach.

> My main concerns would be:
> 
>   * Portability.

I am not concerned. The parallelism limit is a mechanism to increase
efficiency of builder deployments and not much more. The portable
solution is to stuff in more RAM or supply a lower parallel value
outside-in. A 90% solutions is more than good enough here.

>   * Whether this is a local property of the package (so that the
>     maintainer has the needed information to decide on a value, or
>     whether this depends on the builder's setup, or perhaps both).

All of what I wrote in this thread thus far assumed that this was a
local property. That definitely is an oversimplification of the matter
as an upgraded clang, gcc, ghc or rustc has historically yielded
increased RAM consumption. The packages affected tend to be sensitive to
changes in these packages in other ways, so they generally know quite
closely what version of dependencies will be in use and can tailor their
guesses. So while this is a non-local property in principle, my
expectation is that treating it as if it was local is good enough for a
90% solution.

>   * We might need a way to percolate these parameters to children of
>     the build/test system (as Paul has mentioned), where some times
>     you cannot specify this directly in the parent. Setting some
>     standardize environment variables would seem sufficient I think,
>     but while all this seems kind of optional, this goes a bit into
>     reliance on dpkg-buildpackage being the only supported build
>     entry point. :)

To me, this reads as an argument for using an inside-out approach.

Given all of the other replies (on-list and off-list), my vision of how
I'd like to see this approached has changed. I see more and more value
in leaving this in close control of the package maintainer (i.e.
inside-out) to the point where different parts of the build may use
different limits.

How about instead we try to extend coreutils' nproc? How about adding
more options to it?

  --assume-units=N
  --max-units=N
  --min-ram-per-unit=Z

Then, we could continue to use buildopts.mk and other mechanism to
extract the passed parallel value from DEB_BUILD_OPTIONS as before and
run it through an nproc invocation for passing it down to a build system
in the specific ways that the build system requires. More options could
be added to nproc as-needed.

Helmut

Reply via email to