Janne, thankyou. That FGCI benchmark in a container is pretty smart. I always say that real application benchmarks beat synthetic benchmarks. Taking a small mix of applications like that and taking a geometric mean is great.
Note: *"a reference result run on a Dell PowerEdge C4130"* In the old days CERN had a standard unit of compute, which was equivalent to a VAX. I am sure that unit has long been retired. Though I must say that having participated in CERN tenders a few years ago they use SpecFP measurements to compare systems. On Thu, 20 Jun 2019 at 07:41, Janne Blomqvist <janne.blomqv...@aalto.fi> wrote: > On 19/06/2019 22.30, Fulcomer, Samuel wrote: > > > > (...and yes, the name is inspired by a certain OEM's software licensing > > schemes...) > > > > At Brown we run a ~400 node cluster containing nodes of multiple > > architectures (Sandy/Ivy, Haswell/Broadwell, and Sky/Cascade) purchased > > in some cases by University funds and in others by investigator funding > > (~50:50). They all appear in the default SLURM partition. We have 3 > > classes of SLURM users: > > > > 1. Exploratory - no-charge access to up to 16 cores > > 2. Priority - $750/quarter for access to up to 192 cores (and with a > > GrpTRESRunMins=cpu limit). Each user has their own QoS > > 3. Condo - an investigator group who paid for nodes added to the > > cluster. The group has its own QoS and SLURM Account. The QoS allows > > use of the number of cores purchased and has a much higher priority > > than the QoS' of the "priority" users. > > > > The first problem with this scheme is that condo users who have > > purchased the older hardware now have access to the newest without > > penalty. In addition, we're encountering resistance to the idea of > > turning off their hardware and terminating their condos (despite MOUs > > stating a 5yr life). The pushback is the stated belief that the hardware > > should run until it dies. > > > > What I propose is a new TRES called a Processor Performance Unit (PPU) > > that would be specified on the Node line in slurm.conf, and used such > > that GrpTRES=ppu=N was calculated as the number of allocated cores > > multiplied by their associated PPU numbers. > > > > We could then assign a base PPU to the oldest hardware, say, "1" for > > Sandy/Ivy and increase for later architectures based on performance > > improvement. We'd set the condo QoS to GrpTRES=ppu=N*X+M*Y,..., where N > > is the number of cores of the oldest architecture multiplied by the > > configured PPU/core, X, and repeat for any newer nodes/cores the > > investigator has purchased since. > > > > The result is that the investigator group gets to run on an > > approximation of the performance that they've purchased, rather on the > > raw purchased core count. > > > > Thoughts? > > > > > > What we do is that our nodes are grouped into separate partitions based > on the CPU model. E.g. the partition "batch-skl" is where our Skylake > (6148) nodes are. The we have a job_submit.lua script which sends jobs > without an explicit partition spec to all batch-xxx partitions (checking > constraints etc. along the way). Then for each partition we set > TRESBillingWeights= to "normalize" the fairshare consumption based on > the geometric mean of a set of hopefully not too unrepresentative > single-node benchmarks [1]. > > We also set a memory billing weight, and have MAX_TRES among our > PriorityFlags, approximating dominant resource fairness (DRF) [2] > > [1] https://github.com/AaltoScienceIT/docker-fgci-benchmark > > [2] https://people.eecs.berkeley.edu/~alig/papers/drf.pdf > > -- > Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist > Aalto University School of Science, PHYS & NBE > +358503841576 || janne.blomqv...@aalto.fi > >