Follow-up Comment #5, bug #42288 (group make): I like the -l memory_limit idea. If less than memory_limit RAM is available, don't start new jobs.
I think we should focus on how much RAM is free on the system. It doesn't
matter how much jobs use, nor how much memory other processes on the system
use. What matters is whether there's enough space available now to start a
new job.
I can confirm that when I talk about memory, I am not talking about swap at
all. I'm talking about RAM.
I don't know about you, but as long as there is memory for all the processes,
I don't care if my CPU load is at 1.00 or 100.00. I already have niceness,
and io-niceness, with which to control that pile of work. Also, that pile of
work is work that must be done. Any CPU, memory, or disk required by that
work is part of the job.
What is not required is virtual memory swapping. Any time swapping occurs the
system is doing way more work than is necessary, and is using the slowest
hardware to do it: disk.
Therefore the primary goal of this feature is to avoid swap if possible, and
to stop thrashing if not.
If we take an example Linux system with no swap, then the OOM killer will stop
processing completely. A smart make could restart any jobs that were killed
that way, but then we run into the same problem: doing work that is
unnecessary, by doing it twice.
Ways to limit memory usage:
1. Some systems have ulimits, both soft and hard. I've never managed to make
this work practically. If a job needs the resources, it just won't complete
without them, so it's a matter of juggling or priority rather than setting
hard limits.
2. The -l limit feature: don't start a process unless the current amount of
free RAM is at an expected threshold.
Both of these techniques still leave open the possibility of thrashing in the
case where N jobs are started and they begin using so much memory that
swapping begins.
What is thrashing? I define it as switching back and forth between two or
more processes that are both stuck in swap. The way to stop thrashing once it
starts is to stop all such processes but one, so that swapping actually helps
one process to finish.
There are two ways to do this:
1. Rely on the OOM killer, or have make itself kill and restart the job.
2. Put all but one job (the one using the most memory) to sleep with SIGSTOP,
restoring them slowly one at a time with SIGCONT until the system has -l limit
memory free again.
Option 2 prevents thrashing, allowing the system to shift the entire excess
job load into swap if necessary in order to finish the largest one. Recovery
of that overloaded situation is done gradually by waking them up one by one
until load and memory usage are back in normal range.
Both options give a clear, finite path to job completion, while thrashing, if
left unattended, may not finish in any reasonable timeframe.
I have done this manually, running top in memory sorted mode, and stopping
compilers as needed in order to finish, and then restarting them when
resources free up. If this were automated, I think it could work.
In the end, the Make user would set -l to the approximate memory size of the
average job, perhaps slightly larger. Make would then take care of the rest.
As long as there was enough swap for (N-1)*memory_limit bytes, then -jN is
something that would be safe to walk away from and allow Make to monitor and
finish.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?42288>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
