We're a Grid Engine shop, and we have the execd/shepherds place each job in its own cgroup with CPU and memory limits in place. This lets our users make efficient use of our HPC resources whether they're running single-slot jobs, or multi-node jobs. Unfortunately we don't have a mechanism to limit network usage or local scratch usage, but the former is becoming less of a problem with faster edge networking, and we have an opt-in bookkeeping mechanism for the latter that isn't enforced but works well enough to keep people happy.
On Fri, Jun 08, 2018 at 05:21:56PM +1000, Chris Samuel wrote: > Hi all, > > I'm curious to know what/how/where/if sites do to try and reduce the impact > of > fragmentation of resources by small/narrow jobs on systems where you also > have > to cope with large/wide parallel jobs? > > For my purposes a small/narrow job is anything that will fit on one node > (whether a single core job, multi-threaded or MPI). > > One thing we're considering is to use overlapping partitions in Slurm to have > a subset of nodes that are available to these types of jobs and then have > large parallel jobs use a partition that can access any node. > > This has the added benefit of letting us set a higher priority on that > partition to let Slurm try and place those jobs first, before smaller ones. > > We're already using a similar scheme for GPU jobs where they get put into a > partition that can access all 36 cores on a node whereas non-GPU jobs get put > into a partition that can only access 32 cores on a node, so effectively we > reserve 4 cores a node for GPU jobs. > > But really I'm curious to know what people do about this, or do you not worry > about it at all and just let the scheduler do its best? > > All the best, > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Skylar _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf