Re: [Beowulf] Interactive vs batch, and schedulers

David Mathog Fri, 17 Jan 2020 09:54:03 -0800

On Thu, 16 Jan 2020 23:24:56 "Lux, Jim (US 337K)" wrote:

What I’m interested in is the idea of jobs that, if spread across many
nodes (dozens) can complete in seconds (<1 minute) providing
essentially “interactive” access, in the context of large jobs taking
days to complete.   It’s not clear to me that the current schedulers
can actually do this – rather, they allocate M of N nodes to a
particular job pulled out of a series of queues, and that job “owns”
the nodes until it completes.  Smaller jobs get run on (M-1) of the N
nodes, and presumably complete faster, so it works down through the
queue quicker, but ultimately, if you have a job that would take, say,
10 seconds on 1000 nodes, it’s going to take 20 minutes on 10 nodes.


Generalizations are prone to failure but here we go anyway...

If there is enough capacity and enough demand for both classes of jobsone could set up queues for the specific types, to keep the big ones andthe small ones apart, with pretty much constant utilization.

In some instances it may be possible to define the benefit (in someunit, let's say dollars) for obtaining a given job's results and alsodefine the costs (in the same units) for node/hours, wait time, andother resources. Using that function it might be possible to schedulethe job mix to maximize "value", at least approximately. Based solelyon times and nodes, without some measure of benefit and costs it mightbe possible to optimize node utilization (by some measure), but spinningthe CPUs isn't really the point of the resource, right? I expect thatwhatever job mix maximizes value will also maximize optimization, butnot necessarily the other way around. I bet that AWS's scheduler usessome sort of value calculation like that.

A somewhat related problem occurs when there are slow jobs which use alot of memory but cannot benefit from all the CPUs on a node. (Ie, theyscale poorly.) Better utilization is possible if CPU efficient/lowmemory jobs can be run at the same time on those nodes if there are then"spare" CPUs. If done just right this is win win, with both jobsrunning at close to their optimal speeds. This is tricky though becauseif the total memory usage cannot be calculated ahead of time to be surethere is enough the two jobs can end up fighting over that resource withrun times going way way up when page faulting occurs or jobs crashingwhen the system runs out of memory.


Regards,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Interactive vs batch, and schedulers

Reply via email to