Bug#1091394: nproc: add new option to reduce emitted processors by system memory

Helmut Grohne Fri, 27 Dec 2024 03:57:25 -0800

Control: tags -1 + wontfix
Control: close -1

Hi Michael,

On Thu, Dec 26, 2024 at 02:57:12PM -0500, Michael Stone wrote:
> On Thu, Dec 26, 2024 at 09:01:30AM +0100, Helmut Grohne wrote:
> > What other place would be suitable for including this functionality?
> 
> As I suggested: you need two tools or one new tool because what you're
> looking for is the min of ncpus and (available_mem / process_size). The
> result of that calculation is not the "number of cpus", it is the number of
> processes you want to run.

This reinforces the question asked in my previous mail what use case
nproc solves. There I have been arguing that changing circumstances
render a significant fraction of what I see as its use cases becoming
broken.

> Here's the problem: the definition of "available memory" is very vague.
> `free -hwv` output from a random machine:

There is no question about that. You are looking at it from a different
angle than I am though. Perfection is not the goal here. The goal is
guessing better than we currently do. There are two kinds of errors we
may do here.

We may guess a higher concurrency than actually works. This is the
status quo and it causes failing builds. As a result we have been
limiting the number of processors available to build machines and thus
reduce efficiency. So whatever we do here can hardly be worse than the
status quo.

We may guess a lower concurrency than actually works. In this case, we
slow down builds. To a certain extent, this will happen. In return, we
get less failing builds and we get a higher available concurrency to the
majority of builds that do not require huge amounts of RAM. We are not
optimizing build latency here, but build throughput as well as reducing
spurious build failures. Accepting this error is part of the proposed
strategy.

> IMO, there is no good answer to that question. It's going to vary based on
> how/whether virtual memory is implemented, the purpose of the system (e.g.,
> is it dedicated to building this one thing or does it have other roles that
> shouldn't be impacted), the particulars of the build process (is reducing
> disk cache better or worse than reducing ||ism?), etc.--and we havent even
> gotten to cgroups or other esoteric factors yet. Long before asking where
> nmem should go, you'd need to figure out how nmem would work. You're

This is exactly why I supplied a patch, right? I am beyond the figuring
out how it should work as I have now translated the proposed
implementation into the third programming language. As far as I can see,
it works for the typical build machine that does little beyond compiling
software.

> implicitly looking for this tool to be portable (or else, what's wrong with
> using /proc/meminfo directly?) but I don't have any idea how that would
> work. You'd need to somehow get people to define policies, what would that
> look like? I'd suggest starting by writing a proof of concept and shopping
> it around to get buy-in and/or see if it's useful. The answers you get from
> someone doing HPC on linux may be different from the administrator of an
> openbsd server or a developer on an OS/X laptop or windows desktop. I'm
> personally skeptical that this is a problem that can be solved, but maybe
> you'll be able to demonstrate otherwise. At any rate, looking for a project
> to host & distribute the tool would seem to be just about the last step.
> Actually naming the thing won't be easy either, but showing how it works is
> probably a better place to start.

Your resistance is constructive. Both of us agree that the proposed
heuristic falls short in a number of situations and will need
improvements to cover more situations. Iterating this via repeated
coreutils updates likely is a disservice to users as it causes long
iteration times and renders coreutils (or part of it)
unreliable/unstable. As a result, you suggest self-hosting it at least
for a while. I was initially disregarding this option as it looked like
such a simple feature, but your reasoning more and more convinces me
that it is not as simple as originally anticipated. Doing it as a new
upstream project actually has some merit as the number of expected users
is fairly low.

Thanks for engaging in this discussion and clarifying your views as that
moved the discussion forward. You made me agree that coreutils is not a
good place (at least not now). Especially your and Guillem's earlier
feedback significantly changed the way I look at this.

Helmut

Bug#1091394: nproc: add new option to reduce emitted processors by system memory

Reply via email to