The call for abstracts for the 2018 Slurm User Group meeting has been
extended until Friday July 6, 2018.
You are invited to submit an abstract of a tutorial, technical presentation
or site report to be given at the Slurm User Group Meeting 2018. This event
is sponsored and organized by CIEMAT an
I've reported everything back to the actual sysadmin
of the cluster... and the truth behind this story is as
unbelievable as the story itself.
savvy cluster user asked "what is linux?" kind of user
to submit 'his' watchdog script to improve the cluster
load.
Basically you get the f. out of
A great detective story!
> June15 but there is no trace of it anywhere on the disk.
Do you have the process ID (pid) of the watchdog.sh
You could look in /proc/(pid) /cmdline and see what that shows
On 2 July 2018 at 11:37, Matteo Guglielmi wrote:
> Unbelievable... and got it by chance.
>
Unbelievable... and got it by chance.
jobs were killed (again) at 21:04 and in the user's list of running
processes there was a 'sleep 5' command (13 hours + 53
minutes + 20 seconds) which was fired up exactly at the same
time.
The watchdog.sh script (from which the sleep command is fired)
wa