Re: [slurm-users] [External] Proposal for new TRES - "Processor Performance Units"....

Prentice Bisbal Fri, 21 Jun 2019 13:22:30 -0700

In this case, I would run LINPACK on each generation of node (either thefull node or just one core), and then somehow normalize performance. I would recommend using the performance of a single core of the slowestnode as your basis for normalization so it has a multiplier of 1, andthen the newer systems would have a multiplier greater than 1. Then youcan take that multiplier and multiply it by the number of cores in yourdifferent systems to get a final multiplier for a while node, if needed.


Prentice


On 6/19/19 3:30 PM, Fulcomer, Samuel wrote:

(...and yes, the name is inspired by a certain OEM's softwarelicensing schemes...)
At Brown we run a ~400 node cluster containing nodes of multiplearchitectures (Sandy/Ivy, Haswell/Broadwell, and Sky/Cascade)purchased in some cases by University funds and in others byinvestigator funding (~50:50). They all appear in the default SLURMpartition. We have 3 classes of SLURM users:
 1. Exploratory - no-charge access to up to 16 cores
 2. Priority - $750/quarter for access to up to 192 cores (and with a
    GrpTRESRunMins=cpu limit). Each user has their own QoS
 3. Condo - an investigator group who paid for nodes added to the
    cluster. The group has its own QoS and SLURM Account. The QoS
    allows use of the number of cores purchased and has a much higher
    priority than the QoS' of the "priority" users.
The first problem with this scheme is that condo users who havepurchased the older hardware now have access to the newest withoutpenalty. In addition, we're encountering resistance to the idea ofturning off their hardware and terminating their condos (despite MOUsstating a 5yr life). The pushback is the stated belief that thehardware should run until it dies.
What I propose is a new TRES called a Processor Performance Unit (PPU)that would be specified on the Node line in slurm.conf, and used suchthat GrpTRES=ppu=N was calculated as the number of allocated coresmultiplied by their associated PPU numbers.
We could then assign a base PPU to the oldest hardware, say, "1" forSandy/Ivy and increase for later architectures based on performanceimprovement. We'd set the condo QoS to GrpTRES=ppu=N*X+M*Y,..., whereN is the number of cores of the oldest architecture multiplied by theconfigured PPU/core, X, and repeat for any newer nodes/cores theinvestigator has purchased since.
The result is that the investigator group gets to run on anapproximation of the performance that they've purchased, rather on theraw purchased core count.
Thoughts?

Re: [slurm-users] [External] Proposal for new TRES - "Processor Performance Units"....

Reply via email to