Re: [slurm-users] Nodes stay drained no matter what I do

2023-08-24 Thread Patrick Goetz
Hi Mick - Thanks for these suggestions. I read over both release notes, but didn't find anything helpful. Note that I didn't include gres.conf in my original post. That would be this: NodeName=titan-[3-15] Name=gpu File=/dev/nvidia[0-7] NodeName=dgx-2 Name=gpu File=/dev/nvidia[0-6]

Re: [slurm-users] Nodes stay drained no matter what I do

2023-08-24 Thread Patrick Goetz
Hi Rob - Thanks for this suggestion. I'm sure I restarted slurmd on the nodes multiple times with nothing in the slurm log file on the node, but after # tail -f /var/slurm-llnl/slurmd.log # systemctl restart slurmd I started to get errors in the log which eventually lead me to the solutio

Re: [slurm-users] Nodes stay drained no matter what I do

2023-08-24 Thread Timony, Mick
Hi Patrick, You may want to review the release notes for 19.05 and any intermediate versions: https://github.com/SchedMD/slurm/blob/slurm-19-05-5-1/RELEASE_NOTES https://github.com/SchedMD/slurm/blob/slurm-18-08-9-1/RELEASE_NOTES I'd also check the slurmd.log​ on the compute nodes. It's usuall

Re: [slurm-users] Nodes stay drained no matter what I do

2023-08-24 Thread Groner, Rob
Ya, I agree about the invalid argument not being much help. In times past when I encountered issues like that, I typically tried: * restart slurmd on the compute node. Watch its log to see what it complains about. Usually it's about memory. * Set the configuration of the node to whatev

[slurm-users] Nodes stay drained no matter what I do

2023-08-24 Thread Patrick Goetz
Master/Nodes: Ubuntu 20.04, Slurm 19.05.5 (as packaged by Debian) This is an upgrade from a working Ubuntu 18.04/Slurm 17.x system where I re-used the original slurm.conf (fearing this might cause issues). The hardware is the same. The Master and nodes all use the same slurm.conf, gres.con

Re: [slurm-users] How to use partition option "Hidden"?

2023-08-24 Thread Ryan Novosielski
Our experience was that it only works with AllowGroups, and we probably opened a ticket to confirm since it was a choice we didn’t really want to make. That's the route we went, but it's currently causing us some problems with accounting, because we didn't bother doing some of the accounting stu

[slurm-users] How to use partition option "Hidden"?

2023-08-24 Thread Erin Gwen Roberts
Hello, I would like to set a partition to Hidden, and allow members of the appropriate Account to see this partition when running `sinfo` with no parameters. This partition is configured with AllowAccounts= and AllowGroups=. is a member of the Slurm account , verified with `sacctmgr s

Re: [slurm-users] What is the minimal configuration for a compute node

2023-08-24 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
it can be quite tedious for more complex configurations with multi-level includes It's either identical or configless, as far as I know also. What about changing your subdirectories to filenames (e.g. slurm/paritions/bob.conf -> slurm.partitions.bob.conf), and then doing configless, or just "c

Re: [slurm-users] What is the minimal configuration for a compute node

2023-08-24 Thread Michael Gutteridge
Hi By "minimal config" I'm assuming you mean "just enough config to get the slurmd to run". As far as I'm aware, you really need to have a complete and matching config on each of your daemons- like slurmd literally won't start with differing configs. There is the "NO_CONF_HASH" debug flag to get