Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

Ryan Novosielski Tue, 12 Jun 2018 08:12:02 -0700


> On Jun 12, 2018, at 11:08 AM, Prentice Bisbal <pbis...@pppl.gov> wrote:
> 
> On 06/12/2018 12:33 AM, Chris Samuel wrote:
> 
>> Hi Prentice!
>> 
>> On Tuesday, 12 June 2018 4:11:55 AM AEST Prentice Bisbal wrote:
>> 
>>> I to make this work, I will be using job_submit.lua to apply this logic
>>> and assign a job to a partition. If a user requests a specific partition
>>> not in line with these specifications, job_submit.lua will reassign the
>>> job to the appropriate QOS.
>> Yeah, that's very much like what we do for GPU jobs (redirect them to the
>> partition with access to all cores, and ensure non-GPU jobs go to the
>> partition with fewer cores) via the submit filter at present..
>> 
>> I've already coded up something similar in Lua for our submit filter (that 
>> only
>> affects my jobs for testing purposes) but I still need to handle memory
>> correctly, in other words only pack jobs when the per-task memory request *
>> tasks per node < node RAM (for now we'll let jobs where that's not the case 
>> go
>> through to the keeper for Slurm to handle as now).
>> 
>> However, I do think Scott's approach is potentially very useful, by directing
>> jobs < full node to one end of a list of nodes and jobs that want full nodes
>> to the other end of the list (especially if you use the partition idea to
>> ensure that not all nodes are accessible to small jobs).
>> 
> This was something that was very easy to do with SGE. It's been a while since 
> I worked with SGE so I forget all the details, but in essence, you could 
> assign nodes a 'serial number' which would specify the preferred order in 
> which nodes would be assigned to jobs, and I believe that order was specific 
> to each queue, so if you had 64 nodes, one queue could assign jobs starting 
> at node 1 and work it's way up to node 64, while another queue could start at 
> node 64 and work its way down to node 1. This technique was mentioned in the 
> SGE documentation to allow MPI and shared memory jobs to share the cluster.
> 
> At the time, I used it, for exactly that purpose, but I didn't think it was 
> that big a deal. Now that I don't have that capability, I miss it.


SLURM has the ability to do priority “weights” as well for nodes, to somewhat 
the same affect — so far as I know. At our site, though, that does not work as 
it apparently conflicts with the topology plugin, which we also use, instead of 
layering or something more useful.

--
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

signature.asc
Description: Message signed with OpenPGP

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

Reply via email to