Hello,

Following the various postings regarding slurm 19.05 I thought it was an 
opportune time to send this question to the forum.

Like others I'm awaiting 19.05 primarily due to the addition of the XFACTOR 
priority setting, but due to other new/improved features as well. I'm 
interested to hear how other admins/groups test (and stress) new versions of 
slurm. That is, how do admins test a new version with a (a) realistic workload 
and (b) with sufficient hardware resources with taking too many hardware 
resources from their production cluster and/or annoying too many users? I 
understand that it is possible to emulate a large cluster on SMP nodes by 
firing up many slurm processes on those nodes, for example.

I have been experimenting with a slurm simulator 
(https://github.com/ubccr-slurm-simulator/slurm_sim_tools/blob/master/doc/slurm_sim_manual.Rmd)
 using historical job data, however that simulator is based on an old version 
of slurm and (to be honest) it's slightly unreliable for serious study. It's 
certainly only useful for broad brush analysis, at the most.

Please let me have your thoughts -- they would be appreciated.

Best regards,
David

Reply via email to