Hi Mick -
Thanks for these suggestions. I read over both release notes, but
didn't find anything helpful.
Note that I didn't include gres.conf in my original post. That would be
this:
NodeName=titan-[3-15] Name=gpu File=/dev/nvidia[0-7]
NodeName=dgx-2 Name=gpu File=/dev/nvidia[0-6]
Hi Rob -
Thanks for this suggestion. I'm sure I restarted slurmd on the nodes
multiple times with nothing in the slurm log file on the node, but after
# tail -f /var/slurm-llnl/slurmd.log
# systemctl restart slurmd
I started to get errors in the log which eventually lead me to the solutio
Hi Patrick,
You may want to review the release notes for 19.05 and any intermediate
versions:
https://github.com/SchedMD/slurm/blob/slurm-19-05-5-1/RELEASE_NOTES
https://github.com/SchedMD/slurm/blob/slurm-18-08-9-1/RELEASE_NOTES
I'd also check the slurmd.log on the compute nodes. It's usuall
Ya, I agree about the invalid argument not being much help.
In times past when I encountered issues like that, I typically tried:
* restart slurmd on the compute node. Watch its log to see what it
complains about. Usually it's about memory.
* Set the configuration of the node to whatev
Master/Nodes: Ubuntu 20.04, Slurm 19.05.5 (as packaged by Debian)
This is an upgrade from a working Ubuntu 18.04/Slurm 17.x system where I
re-used the original slurm.conf (fearing this might cause issues). The
hardware is the same. The Master and nodes all use the same slurm.conf,
gres.con
Our experience was that it only works with AllowGroups, and we probably opened
a ticket to confirm since it was a choice we didn’t really want to make. That's
the route we went, but it's currently causing us some problems with accounting,
because we didn't bother doing some of the accounting stu
Hello,
I would like to set a partition to Hidden, and allow members of the
appropriate Account to see this partition when running `sinfo` with no
parameters.
This partition is configured with AllowAccounts= and
AllowGroups=. is a member of the Slurm account
, verified with `sacctmgr s
it can be quite tedious for more complex configurations with multi-level
includes
It's either identical or configless, as far as I know also. What about changing
your subdirectories to filenames (e.g. slurm/paritions/bob.conf ->
slurm.partitions.bob.conf), and then doing configless, or just "c
Hi
By "minimal config" I'm assuming you mean "just enough config to get the
slurmd to run". As far as I'm aware, you really need to have a complete
and matching config on each of your daemons- like slurmd literally won't
start with differing configs. There is the "NO_CONF_HASH" debug flag to
get