On Thu, 16 Jan 2020 23:24:56 "Lux, Jim (US 337K)" wrote:
What I’m interested in is the idea of jobs that, if spread across many
nodes (dozens) can complete in seconds (<1 minute) providing
essentially “interactive” access, in the context of large jobs taking
days to complete. It’s not clear to me that the current schedulers
can actually do this – rather, they allocate M of N nodes to a
particular job pulled out of a series of queues, and that job “owns”
the nodes until it completes. Smaller jobs get run on (M-1) of the N
nodes, and presumably complete faster, so it works down through the
queue quicker, but ultimately, if you have a job that would take, say,
10 seconds on 1000 nodes, it’s going to take 20 minutes on 10 nodes.
Generalizations are prone to failure but here we go anyway...
If there is enough capacity and enough demand for both classes of jobs
one could set up queues for the specific types, to keep the big ones and
the small ones apart, with pretty much constant utilization.
In some instances it may be possible to define the benefit (in some
unit, let's say dollars) for obtaining a given job's results and also
define the costs (in the same units) for node/hours, wait time, and
other resources. Using that function it might be possible to schedule
the job mix to maximize "value", at least approximately. Based solely
on times and nodes, without some measure of benefit and costs it might
be possible to optimize node utilization (by some measure), but spinning
the CPUs isn't really the point of the resource, right? I expect that
whatever job mix maximizes value will also maximize optimization, but
not necessarily the other way around. I bet that AWS's scheduler uses
some sort of value calculation like that.
A somewhat related problem occurs when there are slow jobs which use a
lot of memory but cannot benefit from all the CPUs on a node. (Ie, they
scale poorly.) Better utilization is possible if CPU efficient/low
memory jobs can be run at the same time on those nodes if there are then
"spare" CPUs. If done just right this is win win, with both jobs
running at close to their optimal speeds. This is tricky though because
if the total memory usage cannot be calculated ahead of time to be sure
there is enough the two jobs can end up fighting over that resource with
run times going way way up when page faulting occurs or jobs crashing
when the system runs out of memory.
Regards,
David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf