Are there any references out there that discuss the tradeoffs between 
interactive and batch scheduling (perhaps some from the 60s and 70s?) –
Most big HPC systems have a mix of giant jobs and smaller ones managed by some 
process like PBS or SLURM, with queues of various sized jobs.

What I’m interested in is the idea of jobs that, if spread across many nodes 
(dozens) can complete in seconds (<1 minute) providing essentially 
“interactive” access, in the context of large jobs taking days to complete.   
It’s not clear to me that the current schedulers can actually do this – rather, 
they allocate M of N nodes to a particular job pulled out of a series of 
queues, and that job “owns” the nodes until it completes.  Smaller jobs get run 
on (M-1) of the N nodes, and presumably complete faster, so it works down 
through the queue quicker, but ultimately, if you have a job that would take, 
say, 10 seconds on 1000 nodes, it’s going to take 20 minutes on 10 nodes.

Jim


--

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to