[slurm-users] Doubts with Fairshare

2020-12-01 Thread Gestió Servidors
Hello, My SLURM cluster is applying "FairShare" with these values: PriorityType=priority/multifactor PriorityDecayHalfLife=7-0 PriorityCalcPeriod=5 PriorityUsageResetPeriod=QUARTERLY PriorityFavorSmall=NO PriorityMaxAge=7-0 PriorityWeightAge=1 PriorityWeightFairshare=100 PriorityWeightJobS

Re: [slurm-users] Doubts with Fairshare

2020-12-01 Thread Renfro, Michael
Harvard's Arts & Sciences Research Computing group has a good explanation of these columns at https://docs.rc.fas.harvard.edu/kb/fairshare/ -- might not answer your exact question, but it does go into how the FairShare column is calculated. From: slurm-users Date: Tuesday, December 1, 2020 at

Re: [slurm-users] Kill task failed, state set to DRAINING, UnkillableStepTimeout=120

2020-12-01 Thread William Markuske
Hello Robert, I've been having the same issue with BCM, CentOS 8.2 BCM 9.0 Slurm 20.02.3. It seems to have started to occur when I enabled proctrack/cgroup and changed select/linear to select/con_tres. Are you using cgroup process tracking and have you manipulated the cgroup.conf file? Do jo

[slurm-users] how do slurm schedule health check when setting "HealthCheckNodeState=CYCLE"

2020-12-01 Thread taleintervenor
Hello, Our slurm cluster managed about 600+ nodes and I tested to set HealthCheckNodeState=CYCLE in slurm.conf. According to conf manual, setting this to CYCLE shall cause slurm to "cycle through running on all compute nodes through the course of the HealthCheckInterval". So I set "HealthCheckI