Hi,
If I understand it correctly, the MUNGE and SACK authentication modules
naturally require that no-one can get access to the key. This means that we
should not use our normal workstations to which our users have physical access
to run any jobs, nor could our users use the workstations to sub
Hi,
Not sure if I do not understand the scheduling algorithm correctly of if this
is a bug. The bug seems similar to the one described and fixed here
https://github.com/SchedMD/slurm/commit/6a2c99edbf96e50463cc2f16d8e5eb955c82a8ab#diff-0e12d64dc32e4174fe827d104245d2d690e4c929a6cfd95a2d52f65683a6
Hi,
I am getting the following error in the logs whenever I run a few srun jobs in
a batch.
Jan 25 11:24:03 slurmctl.XYZ slurmctld[272961]: slurmctld: debug:
_send_timeout: Socket POLLERR: Connection reset by peer
Jan 25 11:24:03 slurmctl.XYZ slurmctld[272961]: slurmctld: error:
slurm_s