Hello everyone.
I installed Slurm 20.11.09 and installed mariadb, slurmdb and slurmrest normally.
After issuing the jwt token, API test was performed.
/slurm/v0.0.36/ping
/slurm/v0.0.36/diag
/slurm/v0.0.36/jobs
/slurm/v0.0.36/job/submit
I tested this APIs and there was no special pr
Hello Everyone,
When I started slurmctld for the first time, the following error message is displayed.
...
slurmctld: error: Could not open node state file /var/spool/slurm/ctld/node_state: No such file or directory
slurmctld: error: NOTE: Trying backup state save file. Information
That does help, thanks for the extra info.
If I have two separate GPU cards in the node, and I setup 7 MIGs on each card,
for a total of 14 MIG "gpus" in the node...then, SHOULD I be able to salloc
requesting, say 10 GPUs (7 from 1 card, 3 from the other)? Because I can't.
I can request up to
Hi Hans,
We run Slurm in k8s at the ETH Zurich to manage physical compute nodes.
The link you include and Nicolas's followup already contain the basics.
We build several Docker containers based on CentOS 7 (for now) with
Slurm compiled from source for the following services:
* slurmdbd
Hi,
>From what we observed, Slurm sees the MIGs each as a distinct gres/gpu. So
you can have 14 jobs each using a different MIG.
However (unless something has changed in the past year), due to nvidia
limitations, a single process can't access more than one MIG simultaneously
(this is unrelated to