Re: [slurm-users] 4 sockets but "

Ole Holm Nielsen Fri, 23 Jul 2021 03:45:15 -0700

Hi Diego,

On 7/23/21 12:36 PM, Diego Zuccato wrote:

I believe that slurmd reports the 15 minute CPU load average to theslurmctld, only. So you got this information already.
Yup. It's just unexpected: if you don't know, you run pestat and see thatan idle node does have a very high load :)
My users would think someone is breaking the rules...

Well, Slurm reports the 15-minute load average. I guess users will haveto learn that, because we can't print help information every time.

If you run "pestat -F" it will show you (in red color) the nodes wherethe CPU load is outside the expected range, as given by the number ofallocated cores. That covers your situation when 0 CPUs are allocated.
That's how I noticed it.


Yes, pestat can be quite helpful :-)

I'm wondering what information you get from slurmtop, which you'remissing from pestat? Maybe an opportunity for improvement :-)
Well, it shows semi-graphically the CPU allocations for the various jobs,so users can tell at a glance if there are useable nodes for their job.


For finding idle nodes, there are better tools:

* sinfo -t idle

* showpartitions (download fromhttps://github.com/OleHolmNielsen/Slurm_tools/tree/master/partitions)

I added a little code to pestat now that calculates the longest hostname(minimum 8, truncated to 20 chars). This is done by querying Slurm with"sinfo -N -O NodeList". Can you try out this new version on your cluster?
Download: https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat

...

Once fixed, it seems to work OK and columns are aligned. Not the firsttime long names give us problems :( (users are even worse...).

Oops, I fixed this bug in the master branch now, thanks!

/Ole

Re: [slurm-users] 4 sockets but "

Reply via email to