Hi Guillem and others, Thanks for your extensive reply and the followup clarifying the inside-out and outside-in distinction.
On Wed, Dec 04, 2024 at 02:03:29PM +0100, Guillem Jover wrote: > On Thu, 2024-11-28 at 10:54:37 +0100, Helmut Grohne wrote: > > I think this demonstrates that we probably have something between 10 and > > 50 packages in unstable that would benefit from a generic parallelism > > limit based on available RAM. Do others agree that this is a problem > > worth solving in a more general way? > > I think the general idea make sense, yes. Given the other replies on this thread, I conclude that we have rough consensus on this being a problem worth solving (expending effort and code and later maintenance cost on). > > For one thing, I propose extending debhelper to provide > > --min-ram-per-parallel-core as that seems to be the most common way to > > do it. I've proposed > > https://salsa.debian.org/debian/debhelper/-/merge_requests/128 > > to this end. > > To me this looks too high in the stack (and too Linux-specific :). Let me take the opportunity to characterize this proposal inside-out given your distinction. I don't think being Linux-specific is necessarily bad here and note that the /proc interface is also supported by Hurd (I actually checked on a porter box). The problem we are solving here is a practical one and the solution we pick now probably is no longer relevant in twenty years. That's about the time frame I am expect Linux to be the preferred kernel used by Debian (could be longer, but unlikely shorter). > I think adding this in dpkg-buildpackage itself would make most sense > to me, where it is already deciding what amount of parallelism to use > when specifying «auto» for example. > > Given that this would be and outside-in interface, I think this would > imply declaring these parameters say as debian/control fields for example, > or some other file to be parsed from the source tree. I find that outside-in vs inside-out distinction quite useful, but I actually prefer an inside-out approach. You detail that picking a sensible ram-per-core value is environment-specific. Others gave examples of how build-systems address this in ways of specifying linker groups with reduced parallelism and you go into detail of how the compression parallelism is limited based on system ram already. Given all of these, I no longer am convinced that reducing the package-global parallelism is the desired solution. Rather, each individual step may benefit from its own limiting and that's what is already happening in the archive. It is that inside-out approach that we see in debian/rules in some packages. What I now find missing is better tooling to support this inside-out approach. > My main concerns would be: > > * Portability. I am not concerned. The parallelism limit is a mechanism to increase efficiency of builder deployments and not much more. The portable solution is to stuff in more RAM or supply a lower parallel value outside-in. A 90% solutions is more than good enough here. > * Whether this is a local property of the package (so that the > maintainer has the needed information to decide on a value, or > whether this depends on the builder's setup, or perhaps both). All of what I wrote in this thread thus far assumed that this was a local property. That definitely is an oversimplification of the matter as an upgraded clang, gcc, ghc or rustc has historically yielded increased RAM consumption. The packages affected tend to be sensitive to changes in these packages in other ways, so they generally know quite closely what version of dependencies will be in use and can tailor their guesses. So while this is a non-local property in principle, my expectation is that treating it as if it was local is good enough for a 90% solution. > * We might need a way to percolate these parameters to children of > the build/test system (as Paul has mentioned), where some times > you cannot specify this directly in the parent. Setting some > standardize environment variables would seem sufficient I think, > but while all this seems kind of optional, this goes a bit into > reliance on dpkg-buildpackage being the only supported build > entry point. :) To me, this reads as an argument for using an inside-out approach. Given all of the other replies (on-list and off-list), my vision of how I'd like to see this approached has changed. I see more and more value in leaving this in close control of the package maintainer (i.e. inside-out) to the point where different parts of the build may use different limits. How about instead we try to extend coreutils' nproc? How about adding more options to it? --assume-units=N --max-units=N --min-ram-per-unit=Z Then, we could continue to use buildopts.mk and other mechanism to extract the passed parallel value from DEB_BUILD_OPTIONS as before and run it through an nproc invocation for passing it down to a build system in the specific ways that the build system requires. More options could be added to nproc as-needed. Helmut