I am trying to find the GPU hour utilization for a user during a specific time
period using the sacct and sreport commands. However, I am noticing a
significant difference between the outputs of these two commands.
Could you explain the reasons for this discrepancy? Are there specific factors
o
Forget what I just said. slurmctld had not been restarted in a month of
Sundays and it was logging mismatched in the slurm.conf
Slurm reconfig and a restart f all slurmd and problem looks fixed.
On Sun, 10 Nov 2024 at 14:50, John Hearns wrote:
> I have cluster which uses Slurm 23.11.6
>
> When
See:
https://slurm.schedmd.com/pam_slurm_adopt.html#log_level
Try to look for logs in /var/log/secure .
On Sun, Nov 10, 2024 at 9:54 AM John Hearns via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> I have cluster which uses Slurm 23.11.6
>
> When I submit a multi-node job and run someth
I have cluster which uses Slurm 23.11.6
When I submit a multi-node job and run something like
clush -b -w $SLURM_JOB_NODELIST "date"
very often the ssh command fails with:
Access denied by pam_slurm_adopt: you have no active jobs on this node
This will happen maybe on 50% of the nodes
There is t