Hi,
Since users are not experts with cluster environment and slurm, their
wrong works take up my time.

It seems that when their fluent job crashes for some reasons, or they
decide to  close the fluent window without terminating the job or
closing the terminal suddenly or ... the fluent processes remain in
the node while the job is not listed in the output of squeue command.

And since they can not login to the nodes, I have to manually kill the
processes. Is there any better way to manage that?



Regards,
Mahmood

Reply via email to