[slurm-users] Submitting from an untrusted node

2024-05-14 Thread Rike-Benjamin Schuppner via slurm-users
Hi, If I understand it correctly, the MUNGE and SACK authentication modules naturally require that no-one can get access to the key. This means that we should not use our normal workstations to which our users have physical access to run any jobs, nor could our users use the workstations to sub

[slurm-users] srun only runs one job on a node

2024-01-27 Thread Rike-Benjamin Schuppner
Hi, Not sure if I do not understand the scheduling algorithm correctly of if this is a bug. The bug seems similar to the one described and fixed here https://github.com/SchedMD/slurm/commit/6a2c99edbf96e50463cc2f16d8e5eb955c82a8ab#diff-0e12d64dc32e4174fe827d104245d2d690e4c929a6cfd95a2d52f65683a6

[slurm-users] slurmctld: slurm_bufs_sendto(msg_type=SRUN_STEP_SIGNAL) failed: Connection reset by peer

2024-01-25 Thread Rike-Benjamin Schuppner
Hi, I am getting the following error in the logs whenever I run a few srun jobs in a batch. Jan 25 11:24:03 slurmctl.XYZ slurmctld[272961]: slurmctld: debug: _send_timeout: Socket POLLERR: Connection reset by peer Jan 25 11:24:03 slurmctl.XYZ slurmctld[272961]: slurmctld: error: slurm_s