* Rafał Kędziorski <rafal.kedzior...@gmail.com> [190927 14:58]: > > > > > > you may try setting `ReturnToService=2´ in slurm.conf. > > > > > > > Caveat: A spontaneously rebooting machine may create a "black hole" this > > way. > > > > How do you mean this? Could ReturnToService=2 be a problem? >
Hi Rafał, black hole syndrom happens when a node constantly accepts new jobs and then causes these jobs to fail. This may even flush all jobs from the queue for no obvious reason. As Steffen said, this scenario may also happen if a node accepts a job, then spontaneously reboots, then accepts the next job, then reboots again, ... > > Max Planck Institute for Gravitational Physics (Albert Einstein Institute) That makes a somewhat funny element in this context. ;-) Best regards Jürgen -- Jürgen Salk Scientific Software & Compute Services (SSCS) Kommunikations- und Informationszentrum (kiz) Universität Ulm Telefon: +49 (0)731 50-22478 Telefax: +49 (0)731 50-22471