On Thu, 16 Jan 2020 23:24:56 "Lux, Jim (US 337K)" wrote:
What I’m interested in is the idea of jobs that, if spread across many
nodes (dozens) can complete in seconds (<1 minute) providing
essentially “interactive” access, in the context of large jobs taking
days to complete.   It’s not clear to me that the current schedulers
can actually do this – rather, they allocate M of N nodes to a
particular job pulled out of a series of queues, and that job “owns”
the nodes until it completes.  Smaller jobs get run on (M-1) of the N
nodes, and presumably complete faster, so it works down through the
queue quicker, but ultimately, if you have a job that would take, say,
10 seconds on 1000 nodes, it’s going to take 20 minutes on 10 nodes.

Generalizations are prone to failure but here we go anyway...

If there is enough capacity and enough demand for both classes of jobs one could set up queues for the specific types, to keep the big ones and the small ones apart, with pretty much constant utilization.

In some instances it may be possible to define the benefit (in some unit, let's say dollars) for obtaining a given job's results and also define the costs (in the same units) for node/hours, wait time, and other resources. Using that function it might be possible to schedule the job mix to maximize "value", at least approximately. Based solely on times and nodes, without some measure of benefit and costs it might be possible to optimize node utilization (by some measure), but spinning the CPUs isn't really the point of the resource, right? I expect that whatever job mix maximizes value will also maximize optimization, but not necessarily the other way around. I bet that AWS's scheduler uses some sort of value calculation like that.

A somewhat related problem occurs when there are slow jobs which use a lot of memory but cannot benefit from all the CPUs on a node. (Ie, they scale poorly.) Better utilization is possible if CPU efficient/low memory jobs can be run at the same time on those nodes if there are then "spare" CPUs. If done just right this is win win, with both jobs running at close to their optimal speeds. This is tricky though because if the total memory usage cannot be calculated ahead of time to be sure there is enough the two jobs can end up fighting over that resource with run times going way way up when page faulting occurs or jobs crashing when the system runs out of memory.

Regards,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to